
Indian Aoriodltural 
Rbibarch Institu New Delhi 


I.A.II 1.6. 

OIP HL*C—H 3 1.A.R.I.-10-* »5—15.t00 




THE ANNALS 

of 

MATHEMATICAL 

STATISTICS 


The Annals of Mathematical Statistics Is Affiliated 
WITH THE American Statistical Association and Is 
Devoted to the Theory and Application of 
Mathematical Statistics 


Editorial Committee 
H. C. CARVER 
A. L. O’TOOLE 
T. E. RAIFORD 


Volume VII, 1936 

85V3 


PUBLISHED QUARTERLY 
ANN ARBOR. MICHIGAN 




ON THE FREQUENCY FUNCTION OF xy 

By Cecil C. Craig 

Given the distribution function of x and y, what can be said of the distribution 
of the product xy? The author has had two inquiries during the last two years, 
one from an investigator in business statistics and the other from a psychologist, 
concerning the probable error of the product of two quantities, each of known 
probable error. There seems to be very little in the literature of mathematical 
statistics on this question. 

If X and y are independent and are each distributed according to the same 
normal frequency law, it is well known that the distribution function of 

X — nix y — my 
z = - . - 

<Tx (Ty 

is 

Koii) 

TT 

in which Koi2) is the Bessel function of the second kind of a purely imaginary 
argument of zero order.^ If x and y are independent and are each distributed 
according to a logarithmic normal frequency law, it has been pointed out that 
the product, (x — a) ( 2 / — 6), in which a and b are the upper (or lower) limits 
of the range for x and y respectively, is distributed according to a law of the 
same type.^ In both cases the special choice of origins greatly simplifies the 
problem. 

In the present discussion it will be assumed that x and y are distributed 
normally. It will appear that the distribution of xy is a function of r^yj the 
coefficient of correlation between x and y, and of the parameters, 

mi m* , m 2 niy 

Pi = — = — ana p2 = — = — > 

<Ti <Tx (^2 ^y 

which are proportional to the reciprocals of the coefficients of variation. The 

chief difficulty arises when pi and P 2 are small so that zero values of xy occur 

^ J. Wishart and M. S. Bartlett: The Distribution of Second Order Moment Statistics 
in a Normal System; Proceedings of the Cambridge Philosophical Society, Vol. XXVIII 
(1932), pp. 465-459. 

* G. N. Watson: A Treatise on the Theory of Bessel Functions; Cambridge University 
Press (1922), p. 78. 

* P. T. Yuan: On the Logarithmic Frequency Distribution and the Semi-logarithmic 
Frequency Surface; Annals of Mathematical Statistics, Vol. 4 (1933), pp. 46, 47. 

1 



2 


CECIL C. CBAIG 


for values of x and y well within their respective ranges of variation. (If pi 
and Pi are large, practically one may exclude zero values of x and y from con¬ 
sideration. The author hopes to present an investigation of this case soon.) 
It is the object of the present paper to study the rather unusual frequency 
function that arises in this situation. It will first be assumed that x and y 
are independent (r*y = r = 0). Then it will be shown that the distribution 
function when r 0 is readily derived from that arrived at in the special case. 

We can find the moment generating function of xy without difficulty. We 
have, 

_ (x-mi) » __ ( 

Mxy ^ j ^ ^ dxdy 

g( /2(l— 

^ (1 - (r\ (7% ■ 

Setting, for convenience, 


this can be written, 

( 1 ) 



y 

ai(T% 


MXd) 


e I (pj+pj) ^*+2Pi p,.>] /2 (1-.)*) 
(1 - 


This choice of variable and of parameters will be adhered to in the sequel. 

On expanding log Mt(d) in powers of t?, we get for the semi-invariants (of 
Thiele), 

X2ifc4l:* = "f" 1) ^ PlP2 I fc = 0, 1, 2, • • • 

X**:. = (p? + p|) + (2fc - 1)!, A = 1, 2, ... . 


These give for the mean and variance of xy, 


Mxy = mirrii 

2 2 2 ■ 2 2 I 2 2 

^xy = <Ti7n2 -h (T\(T2 . 

For the standard semi-invariants of z (or of xy), we have. 


f2fc+l:i 


XlJt-fl.-jr 


' 2 : * 


(2fc -f- 1)! pipi 

2 ifc-fl ’ 

(p!+pI + i) ’ 


^ik.» __ (2A; — 1) 1 [fe(pi p 2 ) -h 1] 

X2: t (pi + P2 + 1 )^ 



ON FBEQUENCT FUNCTION OF Xy 


3 


Taking, 


6 PiP» 

(pJ+PJ+ !)•/* ’ 

as a measure of skewness, it is easy to verify that 


1 f s 1 ^ I V3 . 


For either pi = 0 or pi = 0, the distribution is symmetrical about its mean 
which then falls at the origin. 

For the excess or kurtosis, we have, 


^4 


b [2(pi -|- p\) + 1] ^ 0 

(pi + P* + 1 )* 


Thus the skewness is never great and becomes small with increasing pi or pj. 
The excess also becomes small with increasing pi or p 2 , but it can be very large 
for small values of these parameters, attaining its maximum of 6 for pi = p* = 0. 
But, as it will appear below, the distribution function always becomes infinite 
in a logarithmic manner at the origin. (We have already seen, as must ob¬ 
viously be the case, that moments of all orders exist.) It is to be noted, too, 
that for any given pi and p 2 , increases without limit with increasing fc, and 
that the same is true of { 2^+1 if neither pi nor p 2 is zero. 

Turning now to the derivation of the actual frequency function of z, we set 

w = xy] then for any given a-, y = w/x, dy = ^ if x > 0, and dy = — ^ 

X X 

a X < 0. These values are substituted into (p\{x) ipi{y)dx dy^ in which ^i(a:) and 
(P 2 {y) are the frequency functions of x and y respectively, and the resulting 
expression is integrated over all values of x, giving for the frequency function 
of w: 


F{w) 



27r<r \tT 2 



Hw, x) - 

X 


f Hw, x) — 
J—QO X 


in which, 


X) = (ffjx*—2miaV—2mtffVz-|-aV’)/2<TVx* ^ 


Again setting z 


xy 

—, and introducing the parameters pi and p*, this reduces to. 


(o'+c!) 


27r 


(4) 


F{z) = 


ihie) - Me)], 



4 


C£CIL C. CRAIG 


in which, 
(5) 



and ypziz) is the integral of the same function over the interval ( — qo , 0). 
Now writing, 


( 6 ) 

we note that 



Pi — 
X 

e 


X 


dx 


f 


ptx + pi ^ 
X 

e 


X 


can be expanded in a Laurent series in powers of x for all values of x except zero. 

In this expansion the coeflScient of x^^j r ^ 1, is ^ ^r(PiP 2 2 ), in which 

t\ 

( 7 ) *^“^ + r+ l'^(r+ 2)® 2 ! (r + 3)<« 3 ! ''■ ''' ’ 

((r + = (r + /:) (r + fc ~ 1) «.. (r + 1)) . 

We may note parenthetically that 


^ 2r(piP2 z) = ^ > 

in which Ir{x) is the Bessel function of the first kind with a purely imaginary 
argument.^ 

The coefficient of ar*^^, r ^ 0, is Xr(piP 2 z). 

Setting now, 

00 PlI + Pl- 

2/ = ^—z — > 


we substitute this series in (6) and seek to justify the expansion it gives for 
^ 1 ( 2 ) obtained by term by term integration. We write, 


^ 1 ( 2 ) = 



’** HSrXx) dx + 



**’ ^fnix) dx . 


* Watson, loc. cit., p. 77. 



ON FREQUENCY FUNCTION OF Xy 


5 


For z > 0, P 1 P 2 > 0, the terms of fn{x) are all > 0. Then the conver¬ 
gence of 



^*VnU) dx 


is suflBcient to allow term by term integration in the first integral. In the second 
integral we observe that X] fnix) converges uniformly in every fixed interval 
1 ^ ^ a. Then term by term integration is permissible here if 



fn(x) dx 


is convergent.® It is evident, then, that it will be sufiicient to establish the 
convergence of 



dx . 


If either or both 2 < 0 or piP 2 < 0, it will be easily seen that the series involved 
are still absolutely convergent which is sufficient. 

Now using the definition of the Bessel function of a purely imaginary argu¬ 
ment of the second kind. 


K,{z) = ^ 



6 

t 


it is easy to derive the relation. 


(z) 

2 



e 


X*_ 

2 2k*dx 
X” ■ 


Remembering that K,{z) = K-^{z), we have for our expansion. 


^ 1 ( 2 ) = + (pi + P 2 ) z^ K\ 4* (pi + p\) 

+ (Pi + P2) + • • • 

in which the argument for all the J^-functions is piP 2 2 , and for all the X-func- 
tions is 2 . 


* T. J. Pa Bromwich: An Introduction to the Theory of Infinite Series; Macmillan & Co., 
London, 2nd edition (1926), pp. 496 and 500. 

• Watson, loc. cit., pp. 78 and 183. 



6 


CECIL C. CRAIG 


But we may as well add to this the expansion of —^2(2), which may be 
written, 

* 

xl _ *1 “ pix-pi- 

y 2 ** e , 

- ax , 

X 

and obtain the expansion, 

S I 

Pl+Pl 

F(z) = —-— + (pi + P2) ^ 5Z2 Ki + (pj + P2) ^ K2 + *' * J > 

the convergence of which we will examine. But it must be noted that the 
terms arising from the expansion of 

* * 

plX+pt— —piX — pt — 

X X 

e j e 

- and - 

X X 



which contribute to the expansion of F(2) as just written are those of the forms, 




t -2 < 

P 2 2 


Hence the expansion as written is valid in any case only for 2 > 0. For 2^0, 
we may write however, 


F(2) = - - - (Pl + pl) (pi + P 2 ) ^aK2 

+ (pi + P2) Kz -{- ’' * J > 

in which the arguments for the X) K-functions are the same as before. 

Let us consider now the question of the convergence of (8), first in the case 
that 2 > 0. We set 

v\ / (7- 1)1 • 

Then from the relation, 

(9) X,_i - = --K, 

2 

we readily derive, 

02 ( 2^ \ 

(^+1)«> “ V'^‘" ^ + 1/' 


^ Watson, loc. cit., p. 79. 



ON FREQUENCY FUNCTION OF Xy 


7 


For z > 0, the left hand member and are both > 0 . Thus 


Then let 




2v 

.+ 1 


> 0 . 


= —I —- + > 0 , 

1^+1 

and we have, 

(iTTiyu; > ~ I)^ • 

It is evident from this that for a given z > 0, a j/q exists such that Cy < 3 
for V ^ vq. 

Further since 

E I Pipi* I 
r ^ e 

the convergence sought follows for z > 0 . Since iiC is an even function of z, 
it is easy to sec that (8) is also convergent for z < 0 . For z = 0, the first term 
possesses a logarithmic discontinuity at the origin. 

To calculate ordinates of F(z) there are fairly extensive tables available in 
Watson^s treatise already referred to. These tables may be readily extended 
by means of the asymptotic formula for K(z) for larger values of z, and by means 
of (9) for larger values of p. One can rapidly build up tables of by means 

of the easily obtained recursion formula, 


Zr(^) = Hr+lM + Z--+*(a;) • 

It is unfortunately true that the expansion found for F{z) is very slowly 
convergent for large values of pi and P 2 . 

At the end of this paper are shown three charts of F(z) with the tables of 
ordinates from which they were made by way of illustrating what such curves 
look like. (On the second for comparison the broken line is the normal curve 
of error.) 

For Pi = p 2 = r = 0 , w^e have simply the known result, 

Fiz) = - Ko (z) . 

TT 

For Pi = 1 , p 2 = r = 0 , the curve is symmetrical about its mean (and the 
origin). Here every 2^-function is unity. 

For the case in which pi = p 2 = i, r = 0 , I first constructed tables of 2^,(x) 
for X = dbO.025, db0.05, ±0.1, and by intervals of 0.1 to ±3.0 for i = 0, 1, • • , 
20. Values of ^o{x) and ^ 2 (x) for x = 3.2 and 3.4 were also used. Not 
more than five terms of ( 8 ) were required to obtain values of F(z) accurate 



8 


CECIL C. CRAIG 


to five places of decimals. This distribution curve is skew with ilf, = 0.25 

A t 

and fs;. = -g-. 

The curves are plotted in standard units with unit total area (a, = 

Vpi + p» + 1). The tables of ordinates are given both in units of 

xy j , . z — m, 
z = — and of t = -. 

criO'2 <r# 

Turning now to the case in which r 0, after some computation, we have 
for the moment generating function, 


(p|+p|—2 r pip*)i>*+2 pipt'9 

2ll-(l + r)t?JIl+(l-r)^I 


( 10 ) 


M.(t» = 


V[1 - (1 + r),?] [1 + (1 - r)t>] ■ 

As a check on this result, if we set r = 1 and pi = p 2 = p in it we get, 


pv 




Vl ~ ' 


which may be readily verified to be the moment generating function of 


if X is distributed normally with mean m and variance cr^ 


('-?)■ 


< 7 - 


To obtain the semi-invariants of z in this case, on expanding log in 

powers of t?, setting 

G = pj + P 2 — 2 pip 2 r , b = 2 piP 2 , c = 1 + r , and d = 1 — r , 
we have, 

log M.id) = ^ (1 - (1 + 


[log (1 - cd) + log (1 + dd)] 


( 11 ) 


ad^ + bd 


[2 + (c* _ d2) ,> + (c» + d») d* + (c< - d‘)d‘ + • • • 1 
+ ^ j^(c - d) + (c* + d*) ^ + (c* - j + • • • j j 


from which we derive, 


n! 


(12) 


Xn:. = — (— d)"~M o + jc" — (— d)"} b] 


+ {c- + (- d)*} . 


o 



ON FREQUENCY FUNCTION OF Xy 


9 


In particular, 

. h , c — d 

Al:i “ 2 -2— ~ ^ 

Xi:» = o -|-^— • h -|- —i— = Pi + Pi + 2pip2r + (1 + 

(13) X,:. = I [(c=* _ d^) a + (c» + cP) b] + c» - d» 

= 6 [(pj pI) T *1- pipa (1 + 4" 2 r (3 + 

X4:. = 6 [(c* + d^) a + ((4 ~ 6] + 3 ((4 + d^) 

= 12 (pj + pI) (1 + 3 r2) + 24 pip,r (3 + r^) + 6 (1 + 6r + r^) . 


Noting that 


ar”' ^r~~' 'dr ~ ' 

one can easily demonstrate what seems to be a rather striking property of these 
semi-invariants, viz., 


(14) 


= n(n - 1) . 

dr 


To gain a notion of the magnitude of the skewness and excess in this case, 
we form, 




Xs z 

~r~ 


and 

X 2 : « 


In view of the above property. 


a^s __ 6 X 2 — 3 XsXi 


The denominator of this fraction is always > 0. The numerator, after some 
reduction, can be written, 

6 [p} + P2 “ P1P2 (I ”■ + (Pi + P2) (2 — r^) 

(16) 

+ (Pi + P 2 2) P 1 P 2 T + 1 — r4 . 

The first two terms taken together, the third, and the last are all obviously > 0. 
The term, 


(Pi + Pi ” 2 ) P1P2T 



10 


CECIL C. CRAIG 


has its maximum value for 1 r | = 1. But for r = 1, (15) becomes, 

Pi + Pi + PiPiCPi + Pi) + (Pi ““ pO* > 
and for r = —1, it is, 

Pi + Pi ““ PlPlCPl + Pi) + (Pl + Pl)^ 9 

both of which expressions are easily seen to be > 0. 

Thus (15) is always positive and the maximum value of fs:# is attained for 
r = 1, the minimum value for r = —1. These values are respectively, 

b (pi 4- Pi)^ + 8 -- 6 (pi — p2)^ — 8 

[(pl + Pi)^ + 2]^ [(pi — p2)^ 4" 2]^ 

the absolute value of either being ^ 2\/2, which is attained in the first case 
for Pl = — P 2 and in the second for pi = pj. It Ls seen that for high correlation 
between x and y the skewness of xy can be quite large. 

For the excess, we see that 


attains a value of 12 when pi = — p*, r = 1 or when pi == p 2 , r = —1. Since 
this is such an extraordinary value it does not seem worth while to carry out 
the extended computation that seems to be required to verify one^s surmise 
that this is the maximum of the absolute value. 

Now, to derive the frequency function we proceed as before. We set z = 

0 ’l 0’2 

and then 


in which, 
Ii(z) = 


27r \/l 


r 

I - Jo 


Fiz) = h{z) - Uiz) , 


and /j( 2 ) is the integral of the same function over the interval ( — «>, 0). 
We can write I\{z): 


P —Zrp p.-f-p 

, r t 

2(l-r«) 1- 


27r Vl - 


-r 


2 ( 




dx 

X 


Setting, 


— - - T = u and 

Vl - 


z 


1 - r* 


= f , 



ON FREQUENCY FUNCTION OF Xy 


11 



= Ri and = 

Vl - VT^ 

the integral in the last expression is of the same form as the ^i(z) in the un¬ 
correlated case. It is evident, then, that the distribution function of f can 
be written, 

(!') + (f^l + ^2 + (^1 "I" ^ 2 ) ^4 (^1^2f)‘^2(f) 

+ {U\ + R\) {RMK.it) 

and is essentially of the form of F{z), reached when r = 0, multiplied by an 
exponential function. 

Frequency curves for xy (in standard units) are given in Fig. 1, Fig. 2 and 
Fig. 3. 



Fig. 1 




12 


CECIL C. CRAIG 


Tables of Ordinates of the Distribution Functions, F{z) and F (0 


For pi = Pi = 0, r = 0 Pi = 1, ps = 0, r = 0 

(Curve is symmetrical with respect (Curve is symmetrical with respect 



to origin) 



to origin) 



M, = 0, ff, = 1 


M. 

= 0, cr, = 

V2 

z — i 

F{,z) - Fit) 

2 

Fit) 

t 

Fit) 

0.1 

0.77256 

0.1 

0.58215 

0.07 

0.82328 

0.2 

.55790 

0.2 

.44891 

.14 

.63485 

0.3 

.43887 

0.3 

.37159 

.21 

.52551 

0.4 

.35477 

0.4 

.31736 

.28 

.44882 

0.5 

.29425 

0.5 

.27593 

.35 

.39023 

0.6 

0.24749 

0.6 

0.24270 

0.42 

0.34323 

0.7 

.21025 

0.7 

.21519 

.49 

.30432 

0.8 

.17996 

0.8 

.19193 

.57 

.27143 

0.9 

.15493 

0.9 

.17195 

.64 

.24318 

1.0 

.13402 

1.0 

.15460 

.71 

.21863 

1.2 

0.10138 

1.2 

0.12595 

0.85 

0.17812 

1.4 

.07756 

1.4 

.10340 

0.99 

. 14623 

1.6 

.05983 

1.6 

.08533 

1.13 

. 12068 

1.8 

.04645 

1.8 

.07069 

1.27 

.09997 

2.0 

.03625 

2.0 

.05873 

1.41 

.08306 

2.4 

0.02235 

2.4 

0.04078 

1.70 

0.05767 

2.8 

.01395 

2.8 

.02846 

1.98 

.04025 

3.2 

.00878 

3.2 

.01992 

2.26 

.02818 

3.6 

.00557 

3.6 

.01397 

2.55 

.01976 

4.0 

.00355 

4.0 

.00981 

2.83 

.01388 

4.8 

0.00146 

4.8 

0.00485 

3.39 

0.00685 

5.6 

.00061 

5.6 

.00239 

3.96 

.00338 

6.4 

.00026 

6.4 

.00118 

4.53 

.00167 

7.2 

.00011 

7.2 

.00058 

5.09 

.00082 

8.0 

.00005 

8.0 

.00029 

5.66 

.00040 

9.0 

0.00002 

9.0 

0.00012 

6.36 

0.00017 

10.0 

.00001 

10.0 

.00005 

7 07 

.00007 



11.0 

.00002 

7.78 

.00003 


. 

12.0 

.00001 

8.49 

.00001 








ON FREQUENCY FUNCTION OF Xy 


13 



Pi = 

pj = §, r = 0 



M. = 

0.26, «r. = 


z 

F(z) 

t 

F(t) 

-9.6 

0.00001 

-8.04 

0.00001 

-8.8 

.00002 

-7.39 

.00002 

-8.0 

0.00004 

-6.74 

0.00005 

-7.2 

.00010 

-6.08 

.00012 

-6.4 

.00023 

-5.43 

.00028 

-5.6 

.00054 

-4.78 

.00066 

-4.8 

.00128 

-4.12 

.00157 

-4.0 

0.00311 

-3.47 

0.00381 

-3.6 

.00488 

-3.14 

.00598 

-3.2 

.00769 

-2.82 

.00942 

-2.8 

.01221 

-2.49 

.01495 

-2.4 

.01954 

-2.16 

.02393 

-2.0 

0.03165 

-1.84 

0.03876 

-1.6 

.05213 

-1.51 

.06384 

-1.2 

.08809 

-1.18 

.10788 

-0.8 

.15568 

-0 86 

.19066 

-0.4 

.30423 

-0.53 

.37259 

-0.2 

0.47388 

-0.37 

0.58036 

-0.1 

.64994 

-0.28 

.79598 

0.1 

0.68106 

-0.12 

0.83409 

0.2 

.51947 

-0.04 

.63619 

0.4 

0.36322 

0.12 

0.44484 

0.8 

.21768 

.45 

.26659 

1.2 

.14230 

.78 

.17427 

1.6 

.09621 

1.10 

.11783 

2.0 

.06614 

1.43 

.08100 

2.4 

0.04589 

1.76 

0.05620 

2.8 

.03201 

2.08 

.03920 

3.2 

.02241 

2.41 

.02745 

3.6 

.01571 

2.74 

.01924 

4.0 

.01103 

3.06 

.01351 



14 


CECIL C. CBAJQ 


Pi = Pa = = 0 


M, = 0.25, Om 


t 

Fit) 

4.8 

0.00545 

6.6 

.00269 

6.4 

.00133 

7.2 

.00065 

8.0 

.00032 

8.8 

0.00016 

9.6 

.00008 

10.4 

.00004 

11.2 

.00002 

12.0 

.00001 


2 ' 


i 

Fit) 

3.72 

0.00667 

4.36 

.00329 

5.02 

.00163 

6.67 

.00080 

6.33 

.00039 

6.98 

0.00020 

7.63 

.00010 

8.29 

.00005 

8.94 

.00002 

9.59 

.00001 



-« -5 —* -3 -2 -I 0 I 2 3 4 5 6 

Fig. 2 













ON FRBQUENCY FUNCTION OF Xy 



University of Michigan. 








A NEW EXPOSITION AND CHART FOR THE PEARSON SYSTEM 
OF FREQUENCY CURVES 

By Cecil C. Cbaig 

In the course of some years of teaching classes in mathematical statistics, 
the author has expanded the treatment of the Pearson system of frequency 
functions begun in the Handbook of Mathematical Statistics* into an exposition 
that he believes possesses marked advantages in unity, clarity, and elegance. 
This is accomplished by expressing the variable in standard units throughout 
and by making the two parameters as(a| = jSi, at = /3* in Pearson’s notation) 
and 


, 2at-Zal-&* 

*-^+3— 

fundamental in the discussion. The various formulae that arise are obtained 
directly and in a uniform manner and are relatively simple in form and easy 
to use. The criteria for the different members of the system of functions are 
expressed very simply in terms of as and 6 and the chart corresponding to the 
extension of the Rhind diagram given by Pearson® takes on a strikingly simple 
form. 

Following the beginning made in the Handbook, the system of Pearson 
frequency functions are to be found among the solutions of the differential 
equation 

, > 1 dy a - t 

y dt bo+b^t + b^e’ 

For those solutions y = f(t) for which, 

(6o + hit + hst')t7(oT =0, 

J<-r 


1 H. L. Rietz, Editor-in-Chief; Houghton-Mifflin Co., Boston (1924). See the chapter 
on Frequency Curves by H. C. Carver. 

* The notation used is that of the Handbook, loc. cit., to which reference will be fre¬ 
quently made. The discussion of Robert Henderson, ‘frequency Curves and Moments,’^ 
Transactions of the Actuarial Society of America, Vol, VIII (1904), pp. 30-41, also proceeds 
along very similar lines, although Professor Carver was quite unaware of it when he wrote 
his chapter in the Handbook. The notation of the Handbook seems preferable however. 

* Karl Pearson: Mathematical Contributions to the Theory of Evolution, XIX. Second 
Supplement to a Memoir on Skew Variation; Proc. Roy. Soc., A. Vol. 216 (1916), plate 
opposite p. 456. 


16 



CHART FOR PEARBON SYSTEM OF FREQUENCY CURVES 


17 


if r and 8 are the extremes of the range of variation for and for which the 
first n + 1 moments over this range exist, the recursion formula for moments, 

(2) and -[- Han-lbo + (^ -|-- l)Q;n^l “h "f* 2)an+1^2 = 

can be derived. Then setting n = 0, 1, 2, 3 we get the following expressions 
for the parameters, a, bo, bi, 62 in terms of as and 5: 

a = - b = 

2(1+25)’ ^ 2(1+25) 

h - ^ + ^ h ~ —J—' 

° 2(1 + 25) * 2(1 + 25) 

valid except when 5 = — §. Below note will be taken of those solutions for 
which the conditions imposed in deriving (2) are not satisfied. The case in 
which 5 = — i will be included in the discussion of the transitional types of 
functions. 

It is useful to note that 

-2 < 5 < 2. 



and the result follows. One consequence of this is that bo cannot vanish for 
any Pearson frequency function possessing moments of the fourth order. 

Turning now to the integration of ( 1 ) and the development of the various 
forms of f{t) that arise, it is useful to make the preliminary statements: 

1 . Over the range of variation of t, we must have/(0 ^ 0 . 

2 . The area under curv^e y = f{t) over the range of variation must be finite. 
This being true then we always determine the constant of integration so 
that this area is unity. 

3. The range in each case is taken as the maximum one for which ( 1 ) and ( 2 ) 
may be secured which contains the point, t = 0 . 

4. It is sufficient throughout to take as ^ 0 since the curve for as = —A; is 
only a reflection of that for as = A: through the line ^ = 0 . 


‘ See the Handbook, pp. 103, 104. 



18 


CECIL C. CRAIG 


It seems best to follow the Handbook in disposing of three of the transitional 
types before proceeding to the main types of the system and then to the remain¬ 
ing transitional types. 

The discussion is planned to embody a direct and uniform method of treat¬ 
ment, giving simple formulae for the calculation of the parameters in terms of 
as and d in each case, and noting the salient features of each type of curve. 
The criteria for each type are expressed in terms of as and 6 , which for the 
whole system permit a simple graphical representation by means of the chart 
found at the end of this article. The construction of this chart is made clear 
in the deviation of the criteria. 


Transitional Type: The Normal Frequency Function: as = 6 = 0 

In this case ( 1 ) reduces to, 


1 ^_, 

ydt 

from which 

_ 

(N) y = ce ^ . 


The range is, of course, ( — «, «) with C = (2 tt)"*. On the chart, which we 
shall refer to as the (aj, 3)-diagram, we see that this function corresponds to 
but a single point. 

It may have the appearance of reasoning in a circle to use the values of the 
parameters given by (3), which were derived from (2), in solving ( 1 ) and then 
for the solution obtained examine the validity of (2). However, we may argue 
as follows: We will use the relations (3) as definitions of o, 60 , bi, and 62 in 
terms of as and 8 which are not yet defined. Using the values of a and the 6^8 
given by any choice of as and 5, we solve ( 1 ). If the solution is such that for 
it (2) may be derived, then the relations (3) are valid when as and 8 have their 
usual meanings. For convenience let us denote the conditions for the validity 
of ( 2 ) by (^). It is obvious that conditions (A) are satisfied for 


(N) 


m 



Transitional types 


HI, as 0 , 5 = 0 

X, if also as = 4 . 


We get here (See the Handbook, loc. cit.): 

(Ill) fit) = (A + ty'-^ , 

A = 2/az, the range being (—A, 00 ). 

It is readily verified that, since — 1 > — 1 , conditions (A) are satisfied. 



CHART FOR PEARSON SYSTEM OF FREQUENCY CURVES 


19 


For > 1 (i.e., for a* < 4) the curve is bell-shaped; for < 1 it is Jnshaped 
with an infinite ordinate at t = —A, For the bell-shaped curve the mode 
falls at t = — l/A and the mean—the mode = 1/A = az/2. 

For A^ = 1, we have 

(X) fit) = , 

which represents a J-shaped curve with the range (—1, oo). 

For A^ 9 ^ If the function has been designated type III, the special case as 
type X. On the (aj, 6)-chart the points corresponding to type III functions 
fall on the line 5 = 0, the type X functions being represented by a single point 
on this line. 

Turning now to the discussion of the three main types, we note that for 
6 9^ Of bz 9^ 0 and that consequently the denominator on the right in (1) is 
always a quadratic which we can write in the form 


hi(t - n) (t - rs) 


in which neither ri nor rz can be zero (since bo 9 ^ 0), and 


(4) 


ri 


Ti 


~bi+ Vb\ - ibdb, 
2 bi 

— at — 

2 I ■ 


— ocz — ^oi\ — 45(5 2 ) ^ 

2 8 2 8 


Leaving aside the special case, ri = r 2 , to be dealt with later, we can always 
solve (1) in the form 


(5) 
with 

( 6 ) 


f{t) = C(t - - rz)-* 

d — T \ 1 -|- 6 0C3 1 -j- 2 5 

62 (ri - r%) 5 «y/x> 5 

d — r2 1 -|“ 5 otz 1 “f" 2 5 

62 (r2 - n) ” 6 5 


For 5 < 0, the r^s are real and opposite in sign; for 5 > 0 and a\ < 45(5 + 2), 
the r’s are complex; and for 5 > 0 and a\ > 45(5 + 2), the r^s are real and of 
the same sign. These three conditions with the additional condition that 
a% 9 ^ 0 give rise respectively to the rmin types of frequency functions designated 
I, IV, and VI. The points corresponding to them fall in simply determined 
areas on the (a*, 5)-chart. The boundaries of these areas, the curve, 

(2 + 38)al = 4(1 + 25)^ (2 + 5), 



20 


CECIL C. CRAIG- 


which intersects the type I and type VI areas, and the line, 

8 = - 1/2 

contain the points which correspond to the transitional types. 


Main Type I. ^ O, - 1 < « < 0 - i, (2 + 35)a| 7 ^ 4(1 + 25)^ (2 + «)] 

For as > 0, we see that 

n < 0 < r 2 and that | ri | < | r 2 1 . 


The range is taken to be (n, 7'2) and (5) is written 
(I) y = C(t - ri)*”i(r 2 - t)”^ . 


It is evident that the area under the curve over this interval is finite only 
when Wi + 1 > 0 and m 2 + 1 > 0 and that if these inequalities hold moments 
of all orders exist. In this case also conditions (A) are satisfied. Now 


mi + 1 


m 2 + 1 



and in the present case 


1 ± > 0. 

Vd 


Thus mi + 1 and m 2 + 1 are each > 0 only if 5 > — 1. On the chart, then, 
the points for 5 < — 1 correspond to no frequency functions,—they fall in the 
“Impossible Area.” 

Further the type I curve will be U-shaped, J-shaped, or bell-shaped if both 
m’s are < 0, if the m^s are opposite in sign, or if both are >0. We have 


mi = 



- 1 . 


Since for — 1 < 5 < — J, 


0 < 


1 +8 

8 


< 1 , 


we see that mi < 0 (as > 0) for 8 in this interval. For —J<5<0, mi>0 
only if 



1, 


8 



CHART FOR PEARSON SYSTEM OF FREQUENCY CURVES 


21 


which leads to the condition; 


(2 + 3S)a\ < 4(1 + 2Sy (2 + 5). 


--'-Vi' 


whence it is similarly seen that m 2 > 0 when — ^ < 6 < 0, and that generally 
m 2 > 0 only when 


Thns the curve, 


(2 + 38)al < 4(1 + 25)2 (2 + 6) . 


(2 -|- 35)0:3 = 4(1 + 25)2 (2 + 5) , 


being tangent to the line al = 0 at 5 = — divides the type I area on the chart 
into three parts: Above it lie the points corresponding to U-shaped curves, 
to the^ right of it the points corresponding to J-shaped curves, and below it 

2 

the points corresponding to bell-shaped curves. (Note that for 5 < — ~ the 

o 

curves are always U-shaped.) 

Since r2 — ri > 0 and 62 ^ 0 accordingly as 5 ^ it is readily verified 
that ri < a < r2 only for U- or bell-shaped curves. The sign of a is always 
opposite to that of aa for curves with a mode. Finally the constant is deter¬ 
mined by setting 




ri)’”‘(r2 — dt = I j 


giving 


fiimi + 1, W 2 + 1) (r2 — ‘ 


Main Type IV: as 0, 5 > 0, and ^3 < 45(5 + 2) 


In this case we write: 

— 03 , iy/—D . . 

1 -|- 5 0(8 1 *4" ^ 

With this notation (5) becomes 


r2 = — r — . 


m2 = - 2 “ ^ • 




V i 

i8\ 2 
is) ’ 



22 


CECIL C. CRAIG 


and since, 

c t 

* -1 -1 
_ tan 6/o _ gc(»/2—tan o/6) 

the frequency function can be written, 

(IV) y = [« + r)» + 


a — bi 
,a + bi 


It is readily seen that m > 0, that v is opposite in sign to aj, that 


—p tan' 

e 


t +r 


can always be taken to lie between and and that the range cai 
be taken (— «, oo). 

In the previously discussed cases in which 5 ^ 0, if the area under the 
was finite moments of all orders existed. In the present case, the area ar 
first four moments are always finite but this may fail to be true of mome 
higher orders. For, since 0 < 6 < 2, 


m = 


1 + 26 5 

b ^ 2^ 


and the integral. 



dt 


for J(t) given by (IV) will be finite for n ^ 4 and infinite for n = 5 if 6 
In order for the n-th moment to exist we must have 


2 m > n + 1 


or 


b < 


2 


n — 3 ’ 


Pearson designated as heterotypic those members of his system of frequency 
functions for which the eighth moment failed to exist. (In such a case the 
standard deviation of the fourth moment in samples would be infinite.) Set¬ 
ting n = 8, we get 6 = 2/5 as the deadline on the (aj, 6)-chart. 

It was apparent that conditions (A) were satisfied for —1 < 6 < 0. (It 
will appear below that the case in which 6 = — i is no exception.) For 6 > 0 
it will be seen that it is generally true, as in the present case, that the formulae 
(2) and (3) can be derived if an-f -2 exists, i.e., if 


n — 1 ’ 



CHART FOR PEARSON SYSTEM OP FREQUENCY CURVES 


23 


To determine C, on setting the integral of (V) over the interval (— «, <») 
equal to unity, we get 

g2m—1 

^ (?(2w - 2, v) 

in which 


G{2m — 2, 



gin2m-2^giV ® 




2 


— tan“‘ 


'-^0 


Main Type VI: a, 0, 5 > 0, a| > 45(5 + 2) [(2 + 35)a| 9 ^ 4(1 + 25)»(2 + 5)] 

The conditions specify the remaining area on the chart. This may be left 
in the form 

( 5 ) 2 / = - nr^(t - 

Now Ti and arc both opposite in sign to aj, which, as usual, we will consider 
positive, and 1 rj | > | ri |. Always m2 < 0 and mi ^ 0 accordingly as 

(2 + 35)a^ $ 4(1 + 2bY (2 + 6). 

We note that 

a — 72 — hiirt — ri)m 2 > 0, 

since now 62 > 0, and that 

a — n = 62(^1 — r2)mi 

has the same sign as mi. Finally a < 0. 

Thus for as > 0 and mi > 0, the point < = a on the axis of t lies to the right 
of both t = Ti and t = rs. Also 


mi + m2 = 


2(1 + 25) 
5 



The range is taken (n, 00), the curve being bell-shaped when mi > 0. If 
mi < 0, the curve is J-shaped, i = a now lying to the left of t = ri. 

Since 


mi + m2 < —5, and mi + 1 > 0, 

the area and the first four moments always exist. In order for the n-th moment 
to be finite, we must have 

— (mi + m2) > n + 1 

which is the same condition as in the case of the type IV function, giving the 
same deadline, 5 = 2/5. 


* Cf: Tables for Statisticians and Biometricians, Cambridge Univ. Press, Part I, 2nd 
edition (1924), p. Ixxxi. 



24 


CECIL C. CRAIG 


If the origin be shifted tG the point, t = we have writing, 
t -- r 2 — z, ri — r2 = a, 
for the type VI function the expression, 

(VI) y = Cz^'^iz - , 

with the range (a, oo). Finally 

c =- - - 

/3(mi + 1, -Wi - m2 - 1) ■ 


Transitional Type H: aa = 0, — 1 < 5 < 0. (5 5 ^ — i) 

In this case, 

ri = —r2= < 0 

0 

mi = tn 2 = — —^ 0 accordingly as 5 ^ J . 

0 


The frequency function is a special case of type I; setting, 

-n = r2 = S 


mi = m2 = M, 


we can write it in the form, 

(II) y = C(S^ - <^)". 


As in all cases in which = 0, the curve is symmetrical about the mean.® As 
in the type I case, the area and moments do not exist for 5^ —l;for—1< 
5 < — i, the curve is U-shaped; for — | < 5 < 0, it is bell-shaped. The range 
is, of course, ( —S, S). 

Finally, 

Q -- __ 

(2S)2^+i/9(Af-h 1,M+ 1)’ 

Transitional Type VII; aa = 0, 5 > 0 


This function may be regarded as a special case of type IV, with 

r = o, s = y®±!^>0, . = 0, and m = 0, 

25 


* It follows at once from the recursion formula, 

n 

^ 1(2 -f 5 ) a„-i -h as o„], 

2 — (n — 2) a 

obtained from setting the expressions (3) in (2), that on changing the sign of as, the signs 
of all the odd moments are changed. 



CHAKT FOR PEARSON SYSTEM OF FREQUENCY CURVES 


25 


and we write the function: 

(VII) y = C(fi + s2)—. 

The type VII function may equally well be derived from the type II function 
by noting that 

S = is and M = —m. 

The range is ( — «, «) however and for b ^ 2/5 the function is heterotypic. 
Finally 

^ r(m) 

- 1 ^ ■ 


Here 


Transitional Tjrpe V; as 5 *^ 0, 5 > 0, = 45(5 + 2) 

ri = r2 = — r 


and we return to (1) to derive the form of the fun(‘tion, writing it: (The type V 
can also be derived as a limiting fomi of type VI) 

I dy _ a — t 
y dt ^ }h{t + rf ' 

On integration we get 

1 a + r 

y z= C(i r) 

a,(l-4 6) 

= C{t + r) ^ e . 

2 r(m-l ) 

(V) = C{t -f r)~“^e ^ + . 

We note that r has the same sign as as and that m = 2 + 1/5. The range is 
taken to be ( — r, zb oo) accordingly as as ^ 0. The curee is always bell¬ 
shaped. In order for the ?i-th moment to exist we must have as always when 
5 > 0 , 

4 + 2/5 > r? + 1 

leading to the same conclusions as in the type IV or VI case. Finally 

^ _ [2r{m — l)]2»"-i 

” r(2m - 1 ) * 


Transitional Tjrpe Vni; as = 0, 5 < — i, (2 -f 35)a8 = 4(1 + 25)2(2 + 5) 

The function is a special case of type I in which mi < 0 and m% = 0. But 
when m% = 0, mi = —2m, and the frequency function becomes 

(VIII) y = C{t ^ ri)-2- . 



26 


CECIL C. CRAIG 


The range is (ri, rj), the curve being J-shaped with an infinite ordinate at 
I = n and a finite one at t = rj. In this case, 


C = 


1 — 2m 
(rj — ‘ 


(1 - 2m > 1) 


Transitional Type IX: as 5*^ 0, — i < 5 < 0, (2 + 35)a| = 4(1 + 26)2(2 + 6) 

We have another special type I function in which mi = 0 and ma = — 2m > 0. 
The function is 

(IX) y = C(ra - t)-^- 

the range still being (ri, ra), the curve being J-shaped with a finite ordinate at 
^ = ra. C has the same value as in the type VIII case. 


Transitional Type XI; as 0, 0 < 5 < 2/5, (2 + 36)a3 = 4(1 + 26)2(2 + 6) 

The function is a special type VI in which mi = 0, and ma = — 2m < 0, 
and we may write it 

(XI) y ra)-2- 

with the range still (ri, oo). The curve is J-shaped with a finite ordinate at 
i = n. Again, 

(n - ra)2"*-i \ ^6/ 

Transitional Tjrpe XU: 6 = — J 

If 6 = — i, the four linear equations derived from (2) from which the values 
of a, 6o, 6i, and 6a in (3) are derived are inconsistent. We can however set the 
values (3) in the differential equation (1) and from its limiting form as 6 —i, 

derive the function appropriate to this case. 

We obtain 


1 dy — ^8 — 2(1 -|- 26 )^ 
y dt (2 -f- 6 ) azt -|- 6^2 

and if 5 = — this becomes 

I dy _ 2 aa _ 2 aa 

y dt <2 __ 2 aa^ — 3 ~ {t — n) (J — ra) 

with 

■= Qfa — ^<*3 "t" 3 , Ta = aa + a* -|- 3 . 

On integration, 

y = C\t - ri)"*‘ {t - ra)’^ , 



CHART FOR PEARSON SYSTEM OF FREQUENCY CURVES 


27 


in which 


mi 


+ 3 


= 


V^as + 3 


We observe that (as > 0) 


ri > 0 > ri, 1 rj I > I ri 1 


mj = —mi > 0 . 

Taking the range to be (ri, rj), we write, 

(XII) "-KN)”' 

the curve being J-shaped. Here 


(ra - n) /3(1 - 7?i2, 1 + ma) ’ 

The values of the parameters and the form of the function can also be derived 
as a special type I function in which 6 = — 

Finally we note that for as = 0, (XII) reduces to 

t/ = C 

thus including the rectangular distribution function among the Pearson system. 

In the course of the above discussion a system of criteria for the various 
types of functions has been set up in terms of as and 5, in terms of which in 
every case the parameters may be readily calculated. The (aj, 6)-chart which 
makes these criteria visual is comparatively simple to construct and is strik¬ 
ingly simple in appearance. Besides the lines, 


6 = - 1 , 



5 = 0, 



as = 0 , 


it contains only the curves 

a^ = 45(5 + 2) 

on which the points corresponding to the type V function lie, and the curve, 
(2 + 35)a| = 4(1 + 25)* (2 + 5) 

on which the points corresponding to the functions of types VIII, IX, X, and 
XI are found. I must take occasion to express my thanks to Mr. Simon Yang 
who constructed this chart for me. 



CECIL C. CRAIG 





The (a*, 6) Chart for the Pearson System of Frequency Curves 

(The subscript L refers to bell-shaped curves) 


The University of Michigan. 









RANK CORRELATION AND TESTS OF SIGNIFICANCE INVOLVING 
NO ASSUMPTION OF NORMALITY* t 

By Harold Hotelling and Margaret Richards Pabst 

1. Dependence of Tests of Significance on Normality 

The powerful tests of significance, largely the work of R. A. Fisher, which have 
been revolutionizing statistical theory and practice, are in the main based on 
the assumption of a normal distribution in a hypothetical population from which 
the observations are a random sample. The nature and extent of the errors 
likely to result from the application of a test of significance assuming normality, 
where normality does not really exist, have been the subject of investigations 
both experimental and mathematical,‘ which however have not produced 
satisfactory substitutes for FLsher^s methods. A false assumption of normality 
does not usually give rise to serious errors in the interpretation of simple means, 
since the distribution of a mean of any considerable number of cases is very 
nearly normal, no matter what the nature of the parent population, so long as 
it does not fall within a certain class having infinite range, and including the 
Cauchy distribution. The sampling distributions of second-order statistics are 
however more seriously disturbed by lack of normality, as is evident from their 
standard errors. For example the variance (m 4 — of sample variances is 
much affected if differs considerably, as it often does, from the value 3 
which it takes for a normal distribution. Likewise the approximate variance 
of the correlation coefficient, 

2 1 / , M40M11 , M13M11 , 

0 -^ = —- S M22 + ^-h ^-- + -- 

nmm l, 4^20 4 /zo 2 M20 M02 2^20^02 

* Research under a grant-in-aid from the Carnegie Corporation of New York. 

t Presented to the American Mathematical Society at New York, Oct. 26, 1935. 

^ J. L. Carlson, A Study of the Distribution of Means Estimated from Small Samples by the 
Method of Maximum Likelihood for Pearson^s Type II Curve, Unpublished M. A. Thesis, 
Leland Stanford Junior University, 1931. 

Leone Chesire, Elena Oldis and Egon S. Pearson, Further Experiments on the Sampling 
Distribution of the Correlation Coefficient, Journal of the American Statistical Association, 
June, 1932, pp. 121-128. 

Victor Perlo, On the Distribution of Student's Ratio for Samples of Three Drawn from a 
Rectangular Distribution, Biometrika, Vol. XXV, Parts I and II, May, 1933, pp. 203-204. 

Paul R. Rider, On the Distribution of the Ratio of Mean to Standard Deviation in Small 
Samples from Non-Normal Universes, Biometrika, Vol. XXI, Parts I to IV, December, 1929, 
pp. 124-143. 

H. L. Rietz, Note on the Distribution of the Standard Deviation, etc., Biometrika, Vol. 
XXIII, 1931, pp. 424-426. 

W. A. Shewhart and F. W. Winters, Small Samples—New Experimental Results, Bell 
Telephone Laboratories, Reprint B-327, July, 1928. 

29 



30 


HAROLD HOTELLING AND MARGARET RICHARDS PABST 


where Mtv is the mean value of x'y\ and nio = moi = 0, may be substantially 
different from the value (1 — commonly used, to which it reduces if the 

population has the bivariate normal distribution. It is however remarkable 
that if the variates are really independent, so that pn = 0 and ^ts = mmMoi, this 
formula reduces to 


regardless of the form of the distribution. It should of course be remembered 
that these formulae give only the first term of an expansion in inverse powers of 
n, and also that the standard error fails for small samples to characterize the 
distribution adequately. But the sensitiveness of the standard error formula to 
deviations from normality in the population is a 83mptom of the grave dangers 
in using even those distributions which for normal populations are accurate, 
in the absence of definite evidence of normality. 

To substitute in standard error formulae values of the higher moments esti¬ 
mated from the data does not meet the difficulty satisfactorily, since these higher 
moments are themselves subject to sampling errors which are often large, and 
since no exact distributions can ever be obtained in this way. The use of an 
arbitrary system of distributions such as the Pearson curves is subject to the 
same criticisms as that of the normal distribution. These and other special 
distributions may indeed be justified in special cases by general reasoning; an 
example of this in introducing a measure of relationship other than the correlation 
coefficient is to be found in the genetic discussion of Chapter 9 of Fisher^s '^Sta¬ 
tistical Methods for Research Workers.’^ But for a great deal of statistical work 
no such a priori reasoning is available and sufficient to specify a distribution in 
sufficient detail. If a specific form of distribution other than the normal can be 
relied on in a particular case, the mathematical problem of finding the exact 
distribution of the appropriate statistic will still commonly be found diflScult or 
impossible. 


2. Tests Independent of Normality Assumptions 

A set of problems is thus encountered regarding the nature and methods of 
statistical inference possible without assuming any particular distribution of the 
variates in the population from which we have a sample. Tests of significance 
underlying such inferences must clearly be invariant under all transformations 
of each variate. We are thus forced to rely for our information on relations of 
ordevy or of qualitative classification, rather than upon magnitudes, excepting 
insofar as we can use inequalities such as that of Tchebycheff. Classification 
leads to the use of contingency tables, from which accurate probabilities are 
calculable for testing whether or not the two or more principles of cross-classifica¬ 
tion used are independent. If the probability obtained is so small as to render 
it incredible that independence exists, the further problem arises of measuring 
the degree of relationship; but in the absence of special assumptions, such as that 



RANK CORRELATION AND TEBTS OF SIGNIFICANCE 


31 


of the bivariate normal distribution, or those in Fisher^s genetic example men¬ 
tioned above, the problem of measuring degree of relationship is insoluble. Any 
measure of degree of relationship will change its value, unless this value corre¬ 
sponds to independence, when transformations other than those of a restricted 
class are applied to one of the variates. The problem of measuring degree of 
relationship, or correlation, is thus of quite a different character from that of 
testing the existence of a relationship, which is equivalent to absence of inde¬ 
pendence. The existence of correlation may be detected by methods of rank 
order or of classification; these can never, by themselves, be sufficient for its 
measurement. 

To test the deviation of the center of a symmetrical population from some 
definite hypothetical value, Student’s distribution, which is appropriate when 
the population is normal, may be replaced by the binomial distribution, which 
will sometimes show that the preponderance of cases on one side of the hypotheti¬ 
cal value is too great to admit the hypothesis. Fisher applied this principle to 
Student’s original example, showing at the same time that it can in -certain cases 
be used to test the significance of the difference between the means of two 
samples.2 Both this type of test and the use of contingency tables with grouped 
values of variates bring out clearly the fact that abandonment of the assumption 
of normality is equivalent to a certain loss of information, larger samples being 
required to make up for the lack of knowledge of the form of the population. 
The loss of information is greater for contingency tables arranged according to 
the values of the variates than when an appropriate method of rank correlation 
is used, for the contingency table may be regarded as derived from the ranks by 
grouping them, thus discarding some of the information. 

We shall in §8 illustrate a combination of rank and contingency methods 
suitable for utilizing simultaneously two kinds of information contained in 
grouped data. 

For large samples a method of treatment for w^hich a great deal is to be said in 
many cases consists of replacing the observed variate by a new^ variate x to w^hich 
a value is assigned for each individual or frequency class by interpolation in a 
table of the normal probability integral, in such a way that the distribution of x 
in the sample approximates normality. If this is done for each of tw^o variates 
which do not have the bivariate normal distribution, the transformed values x 
and y may also lack the bivariate normal distribution, even approximately, 
though each is normally distributed, so far as w^e can speak of a sample as being 
normally distributed. Even if the bivariate distribution is normal, the correla¬ 
tion coefficient of x and y will not have the same distribution as the correlation 
coefficient in samples drawn from a bivariate normal distribution, since in the 
latter case the distributions of x and y separately would in most samples be less 
nearly normal than when the transformation to approximate normality is 
applied. From these considerations it follows that for the detection of correla¬ 
tion the normalizing transformation cannot be said in general to be the best 

• R. A. Fisher, Statistical Methods for Research Workers^ Art. 24, end. 



32 


HAROLD HOTELLING AND MARGARET RICHARDS PAB8T 


method, even for large samples, though it may be a useful preliminary to the 
application of the method of least squares or to the use of correlation coefficients 
significantly different from zero in certain cases. 

3. The Rank Correlation Coefficient 

Suppose that n individuals are arranged in two orders with respect to two 
different attributes. Thus we might arrange a freshman class in order according 
to their grades in a language examination, and also according to their mathemat¬ 
ical grades. As another example, we might be able to obtain ratings of various 
states with respect to penal law or practice, and also with respect to amount of 
crime. Continuous variates expressing these qualities are likely not to be nor¬ 
mally distributed, so that the product-moment correlation coefficient r cannot 
be expected to have the exact dLstribution known for it in the case of samples 
from a normal population. We may therefore resort to the ranks, ignoring any 
exact values that have been assigned. 

Calling Xi the rank of the tth individual with respect to one attribute, and 
his rank with respect to the other, so that (Zi, X2, • • • , Xn) and (Fi, F2, • • • , 
Fn) are two permutations of the numbers (1, 2, • • • , n), let us put Xi = Xi — x, 
2/i = F, - y, where 


The rank correlation coefficient is defined as 


( 2 ) 


^xy 


the sums being over the n values in the sample. Now since the sum of the first 
n integers is n{n + l)/2, and the sum of their squares is n(n +l)(2n + l)/6, 
we have 


2x2 = S (X - x)^ = 2 X 2 _ (SX)2/n 

_ n(n + l)(2n + 1) n(n + 1)2 _ — n 

“ 6 4 ” 12~' 

and 2^/2 has the same value. Also, if we put di for the difference between the 
two ranks for the fth individual, so that 

di = X i — Y i = Xi — y if 

we have 


- 2Sxy + Sj/2 = ^ - 2Sxj/. 

o 

Substituting in (2) the value of Xxy found from this equation, and also the values 
just obtained for 2a:2 and 21/^, we have: 




RANK CORRELATION AND TESTS OP SIGNIFICANCE 


33 


(4) 


r 


n* — n * 


This is the most convenient formula for computing r'. 

Compared with certain other tests of correlation based on order, such as 21 d ], 
or the number of inversions required to pass from one permutation of the 
numbers to the other, r' appears to be a sensitive index of relationship, since for 
a given value of n it possesses a greater number of distinct values. But to 
assert without qualification that r' or any other statistic is the best possible test 
of correlation based on order relations alone would be meaningless. Indeed, a 
particular type of bivariate distribution might well have a parameter represent¬ 
ing correlation whose significance could best be detected by a test adapted only 
to this particular bivariate distribution. However the rank correlation coeflS- 
cient has properties that point to its value in more general use than it has hereto¬ 
fore received. It has been regarded chiefly as a more easily calculable substitute 
for the product-moment coefficient r. Karl Pearson has remarked that the rank 
correlation coefficient is the easier to compute for samples smaller than approxi¬ 
mately forty, while r involves less labor for larger samples. 

The great value of the rank correlation coeflScient appears to us to consist in 
its use as a test of the existence of correlation, a test capable of exact interpreta¬ 
tion in terms of probability, without any assumption of a normal or other special 
bivariate distribution. If a bivariate distribution is specified by /(x, y) dx dy, 
the condition of independence is that/(x, y) shall be the product of a function of 
x by a function of y. If we put 

(5) ^ = f f fix', y') dx' dy’, t, = f f f{x', y') dx' dy', 

J-cc Jo Jo y-oo 

using the inner integral sign in each case to correspond to the inner differential, 
then each of the quantities f and rj is distributed with uniform density from 
— i to -f and if x and y are independent, then f and rj are also independent. 
The correlation p' of { with rj may be called the rank correlation of x and y 
in the population. It will vanish in case of independence. It is for this case 
that we shall obtain in §§5, 6 and 7 the exact probability test for r' in small 
samples, the exact standard error and fourth moment, and asymptotic values for 
the higher moments, with a demonstration that, for suflBciently large samples, 
r' can be treated as normally distributed. In §9 we shall present, in a revised 
and simplified form, certain work of Karl Pearson relative to the estimation of 
the correlation p in a bivariate normal distribution, and apply the results to 
discuss the question of the importance of the lost information when measure¬ 
ments are replaced by ranks. 

4. History of Rank Correlation Theory 

Rank correlation seems to have had its origin in the method of representing 
the distribution of a variate by grades or percentiles introduced by Francis 



34 


HABOLD ROTEIiUNO AND ICABQABET BICHABDB FAB8T 


(Dalton.* Later Speamxan* proposed that rank be considered in place of the 
variate, and suggested that the correlation of ranks be used as a measure of the 
degree of dependence of the variates. Spearman also introduced the “footrule 
of correlation” based on S j d |. 

The principal memoir on rank correlation is by Karl Pearson.* Assuming an 
underlying normal distribution, Pearson obtains a relation equivalent to 

(6) p = 2 sin ^ p', 

where p is the correlation of x and y in the population, and p' is the correlation of 
uniformized variates f and ri defined by (4). An estimate r" of p may be based 
on the rank correlation r', in accordance with (6), by writing 

(7) r" = 2 sin f r'. 

Pearson finds the first few terms of infinite series giving the standard errors of 
r' and r". He deals similarly with the estimation of correlation by means of 
SI d |. The paper contains a neat proof, attributed to Student, of the probable 
error of r' under conditions of independence. It was this proof that sug¬ 
gested the analysis of §§6 and 7 below. This long memoir is very difficult to 
read and interpret accurately, owing chiefly to the failure to distinguish clearly 
between sample and population. 

The use of the probable error formulae is valid only if the distributions of r' 
and r" are sensibly normal. The question of approximate normality thus raised 
is investigated for the first time in the present paper. In order to use these 
formulae it is necessary to assume not only (1) that the underlying fX)pulation 
has the bivariate normal distribution (an assumption which requires more than 
that each variate be normally distributed), (2) that the first few terms of the 
infinite series are enough, and (3) that the distributions of r' and r" are practi¬ 
cally normal, but also (4) that sample values can be put for population values 
in the formulae, or that population values are known independently or can be 
assumed. It is probably this last condition that has been least imderstood and 
has led to the greatest number of false conclusions regarding the significance of 
data. 

A note by W. C. Eells* presents a compilation of numerous textbook versions 
of the probable errors of r' and r", all differing from each other and from Pear- 

* Francis Galton, Natural Inheriiancet Macmillan, 1889, Chaps. 4 and 5. 

^ C. Spearman, The Prdof and Measurement of Association Between Two Things, American 
Journal of Psychology, Vol. 15,1904. 

‘ Karl Pearson,* On Further Methods of Determining Correlation, Drapers’ Company 
Research Memoirs, Biometric Series IV, Mathematical Contributions to the Theory of 
Evolution, XVI, London, Dulau, 1907. 

• IV. C. Eells, Formulas for Probable Errors of Coefficients of Correlation^ Journal of the 
American Statistical Association, Vol. 24,1929, p. 170. 



EANK OOBREIATION ANB TESTS OF SIGNIFICANCE 


36 


son’s. Taking Pearson’s formulae as correct^ without discussing the assump¬ 
tions implicit in their use, Eells presents a table for calculating the probable 
errors of r, r' and r”. 

5. Significance of Rank Correlation in Small Samples 

If the variates are independent we may without loss of generality assign the 
values 1,2, • • • , n in order to Xi, Xj, • • • , Xn, and regard the F’s as made up 
by any one of the n! permutations of these numbers, all permutations being 
equally probable. The probability of any particular value of r' is thus propor¬ 
tional to the number of permutations giving rise to this value. These may be 
enumerated with the help of (4). Thus for n = 2, each of the values ± 1 has 
the probability J. For n = 3, the possible values of r' are —1, — J, 1, with 
respective probabilities 1/6, 1/3, 1/3, 1/6. For n = 4 the values 1, 4/5, 3/5, 
2/5, 1/5, 0 have the respective probabilities 1/24, 1/8, 1/24, 1/6, 1/12, 1/12. 

From (2) it is evident that the distribution of r' in case of independence is 
symmetrical, since each permutation is exactly as probable as that of directly 
opposite order, and since a change of sign of all the x’s or y’s changes the sign of 
r' without affecting its absolute value. It is clear also that the values r' = ±1, 
corresponding to the twp variates being in the same or opposite orders, are the 
extreme ones, and have each a probability 1/n!. The next greatest value of 
|r'| corresponds to the interchange of two consecutive individuals, who maybe 
selected in n — 1 ways and makes = 2. Thus the values db(l — 12/[n’ — n]) 
occur with probability (n — l)/n! each. Next to these, corresponding to 
ScP = 4, are the values ±(1 — 24/(v? ~ n]), whose probabilities are each 
(n — 2)(n — 3)/2(n!), since the numbers of pairs of mutually exclusive consec¬ 
utive pairs in a sequence of n is (n — 2)(n — 3)/2. In like manner, but with 
greater complexity, it appears that the probability of the value 1 — 36/[n® — n] 

. (n — 3)(n — 4){n — 5) + 12(n — 2) i w- r xu u 

IS ^- V t - ,v —!- ^Easy calculation from these results 

6(nl) 

shows that, if we require for significance a probability P = .01 of a value of | r' | 
as great as or greater than the value observed, then for samples of 5 it is 
impossible to obtain a significant value; for n = 6, significance requires that 
r' = ±1; and for n = 7 the significant values of ] r' ] are 25/28 and more. For 
the less stringent standard P = .05, a unit correlation only is significsmt in a 
sample of 6; while 29/35 is not, but 31/35 is, significant in a sample of 6. 

6. The Standard Error and Fourth Moment 

For large samples the exact calculation of probabilities becomes very laborious, 
and we are forced to resort to approximations. The first step in the available 
approximations is the determination of the standard deviation of the distribu¬ 
tion. The square of this quantity, the second moment or variance of r', may, 
since the mean value of r' in case of independence is zero, be written 


M. -= Er'\ 



36 


HAROLD HOTRLLING AND MARGARET RICHARDS PABST 


the 83rmbol E denoting the expectation or mean value of the quantity following. 
The operation E has the properties that the expectation of a sum is the sum of 
the expectations of the terms, the expectation of the product of independent 
variates is the product of their expectations, and the expectation of the product 
of a constant by a variate is the product of the constant by the expectation of 
the variate. It is particularly to be noted that the first of these properties holds 
whether the terms of the sum are mutually independent or not. 

From (2) and (3) we have 

( 8 ) 


Now we may regard xi, X2, • • • jZnSLS taking the same values in all samples, these 
values being centered at zero and differing consecutively by unity. The are 
then variates, not independent of each other, taking this same set of values, but 
in a manner varying from sample to sample by chance. For any particular !/, 
for example that associated with xi, the chance distribution has moments of the 
form 


(9) 


Eyp = 


2 


n n ^ 


if we denote by Sp the sum of the pth powers of the n numbers differing consecu¬ 
tively by unity and centered at zero. It is clear that, for every odd value of p, 
8p = 0. Also, from (3), 

n’ — n 


In view of these facts, we have from (8), 

j Ei'SxyY + 2S 

(T,, = iir = -- = - - -, 

^2 ^2 

where hxiXt stands for the sum of all the n(n — l)/2 different terms obtained by 
permuting the subscripts. We have 


also 


EViVt 


2'ZxiX% 
n(n - 1) ' 


2 2 x^Xi^ 

C!ombining these results we have: 

(10) - i(d 


S 1 ^ ■“ Sj • 


+ 




n(n — 1)J n — 1 * 


This is the fonnula obtained by Student and incorporated in Pearson’s memoir. 



RANK CORRELATION AND TESTS OF SIGNIFICANCE 


37 


Any desired moment of r' may be obtained in this manner. However the 
complexity of the calculation increases rapidly with the order of the moment, 
and the derivation of even the fourth moment is too long to be included in this 
paper. The value obtained for the fourth moment is 


3(25n^ - 13n» - 73n2 + 37n + 72) 

25n(n +l)»(n - 1)» 

It will be observed immediately that the kurtosis, /Sj = approaches the 

normal value 3 as n increases. 

For values of n which are not small enough for the exact probabCities to be 
computed easily, the Tchebycheff inequality. 


( 11 ) 


P ^ 


1 

(n - 1) r'" ^ 


where P is the probability of a deviation exceeding r', will often be of service. 
Thus, if n = 25 and r' = .9, (11) shows that P is less than .05, so that the 
evidence for existence of a relationship should by an ordinary standard be re¬ 
garded as significant. However this does not in general give an accurate 
approximation to P, nor do the similar inequalities involving the higher moments. 


7. The Higher Moments and the Approach to Normality 

A general moment of r' of even order is defined by 

( 12 ) ii%a = E r'^ = ^E ixiVi + Xiyt + • • • + x„yn)^ . 

S2 


When the parenthesis is expanded we may take the expectation term by term, 
regarding the x^s as constants. Now 


Eyr = 




Ey\ 


2/2 


X2 

n (n — 1) ' 


and so forth, the sums on the right in the numerators being symmetric functions 
of the constants x, taken over all different terms obtained from that written by 
permuting subscripts, and the denominator being in each case the number of 
terms in the numerator. Thus 


(13) /xa« 



f(S^ (S x?--^ (Sx?-^XtXs)^ 

\ n n (n — 1) n (n — 1) (n — 2) 



where the coefficients A, B, • • • depend on a but not on n. With a view to 
determining the leading term in the expansion of fxu in powers of rr^, we shall 
select the term in the curly brackets in (13) of highest degree, meaning by the 
degree of one of these rational fractions the excess of the degree of the numcfWtor 
over that of the denominator. 

The symmetric functions are well known to be expressible as polynomials in 



38 


HABOLD HOTSUJNO Ain> UABGARXT BICRABDS PAB8T 


the power-sums tp. la each term of sueh a polynomial corresponding to one of 
our symmetric function oi degree 2 a, the sum of the subscripts of the «|,’s must 
be 2 a, since if all the x’s are multiplied by a constant such a polynomial must be 
multiplied by the 2ath power of the constant. Now Sp is a polynomial of degree 
even, but vanishes identically if n is odd. Consequently the 
degree in n of any of the terms of the pol 3 momial in the power-sums must exceed 
2 a by the number of power-sums appearing in this term. Therefore, the term of 
highest degree in n obtained, when one of the symmetric fimctions is expressed 
in terms of the Sp'a and thence in terms of n, must contain the greatest possible 
number of the SpS. If p is the number of distinct x’a m a term of one of our 
symmetric fimctions, this function may be written in the form 

SXx X% • • • Xj^ ~ CoSii,So, * ■ • Sap — So, * * * ^ap—t 

(14) — CtSoiSofi-ap • • * ^ap^i — * • • — Cp—ififljSoi ’ * • ^ap-.i+ap 

— C ^ax-farHip ®Oi * * * i — • • • j 

where ai + a* + • • • + = 2 a, and the c*8 do not involve n. In the right-hand 

member of the equation above, the first term involves p of the power-sums, while 
the remaining terms involve fewer of them. Hence, if all the indices Oi, as, • • * , ap 
are even, the first term is a polynomial of degree 2 a -f p in n, while the remain¬ 
ing terms are polynomials of lower degree, and are therefore negligible in compari¬ 
son with the first term when n is sufficiently large. But if any of the indices a< 
are odd, the first term vanishes identically, and the degree of (14), regarded as a 
polynomial in n, is then less than 2 a + p. Since the sum of the indices is 2a, the 
nmnber of odd ones among them must be even; let this number be denoted by 2g, 
and let the number of even indices be m. Then p = m + 2g. The terms of 
highest degree in the right-hand member of (14) must be obtained by grouping 
the odd indices in pairs to form the subscripts of the s's. The degree is therefore 
2 a + m + 5 . 

In (13), the degree of the denominator of each term in the curly brackets is the 
number of distinct appearing in a term of the symmetric function in the 
numerator, namely p, or m + 2q. Hence the excess of the degree of the numera¬ 
tor over that of the denominator is 

2 (2a + m + q) — (m + 2^) = 4a -f m . 

This will be a maximum when m is a maximum, and is independent of g. The 
maximum value of m is a, and occurs only for the symmetric function 

(16) , S xlxl ••• xl , 

The term involving this function is therefore the only one in the right-hand 
member of (13) we need consider. Since this S 3 ntnmetric function contains 
n(n — l)(n — 2) • ‘. (n — a + l)/(al) terms, and since in the expansion of 

(Xiyi + XiJ/t + • • • + XnJ/n)*® 




BANK CORBBIJITION AND TSBTS OF SIGNIFICANCE 


39 


tile coefficient of xixl - • • xl,y\y\ yi is, by the multinomial theorem 
(2a) l/2“, we have from (13), 

1 (2a)l al (X xlxl ■ • • xD* 

~ 2“ n“ 

To evaluate the s]nnmetric function (15), so far as the term of highest order in, 
n is concerned, we of course need only the first term of (14), which reduces in 
this case to 


l^xlxl 


Co«j - 


In the expansion of «? = (xf + + • • • + **)“» the coefficient of (15) is al, 

which is therefore the reciprocal of c». Thus we obtain 


(tu 


(2a)l r 1 
a! 2“ L«“ 



the terms dropped being of higher order in n”*. 


T he 2at h moment of the quotient of r' by its standard error, that is, of 
r' V^n — 1, is (n — 1)“ times that of r', and therefore approaches, as n increases, 
the value 


(16) 


(2a) 1 
a! 2“’ 


The odd moments are all zero because of the symmetry of the distribution of r'. 
But ( 16 ) is the moment of order 2 a of a normal distribution of unit variance and 
zero mean. It follows therefore from the Second Limit Theorem of Probability^ 
that the distribution tends to normality as n increases; that is, for any real 
number X, the limit as n tends to infinity of the probability that r' \/n — 1 < X is 



The normality of the limiting distribution of the rank correlation coefficient 
is rather remarkable, since r', unlike the product-moment correlation coefficient 
r and other statistics in common use, is neither a mean of independent quantities 
nor a function of such means, so that the ultimate normality just established is 
not a corollary of known general theorems. It is imexpected also because the 
exact distribution of r' for samples smaller than six might lead one to anticipate 
a bimodal distribution. 

An outstanding problem is to determine whether the distribution of r' in 
samples from a bivariate normal distribution for which p ^ 0 converges to nor¬ 
mality. Without such an approach to normality, the probable error formulae 


' First proved by Markoff. Cf. Frfechet and Shobat, A Proof of the Oeneralited Second 
Limit Theorem in the Theory of Probability, Transactions of the American Mathematical 
Society, Vol. 88,1932, pp. 533-643. 



40 HAROLD HOTELLING AND MARGARET RICHARDS PABST 

discovered by Pearson are useless. Another problem is to find convenient and 
accurate approximations to the distribution of r', for moderate values of n, with 
dose limits of error. A table calculated along the lines suggested in §5 would 
be very useful. 

8 . Combination of Rank and Contingency Methods 

Suppose that a thousand school children are examined at the end of a course 
of instruction, and rated with the grades A, B, C and D. Five hundred of 
these children are of each sex. The results are: 


1 

A 

B 

c 

D ! 

Totals 

Boys. 

190 

200 

80 

30 

500 

Girls. 

220 

200 

60 

20 

500 

Totals. 

410 

400 


50 

1000 

Proportion of Girls. 

.537 

.500 

.429 

.400 

.500 


Regarding this as a 2 x 4 contingency table with three degrees of freedom, we 
calculate = 7.52, the probability of which value being exceeded by chance 
is .0670. The indications of a significant difference in distribution of grades 
between sexes may thus, if one holds to the .05 standard and uses only the x^ 
test, be regarded as not quite significant. There is, however, additional evidence 
in the fact that the proportion of girls diminishes steadily as we pass down the 
scale of grades. If we treat excellence in the subject as one variate and the 
proportion of girls in a group as another, we have a rank correlation of unity, 
with a sample of four. The probability of a correlation of dbl is .083, which 
also, by itself, would not be considered significant. But we may combine the 
two pieces of evidence by the method given by Fisher.® The process consists 
of adding the natural logarithms of the two probabilities, doubling, and treating 
the result as having the x^ distribution with four degrees of freedom. This gives 
a probability in the neighborhood of .03, which would be judged significant. 

Similar cases are very common. The value of x^ is tmchanged if the columns 
are permuted in any way, whereas r' depends solely on which of the possible 
permutations actually exists. Thus the two tests are independent, a property 
needed for the combination by the above method. 

9. EflSiciency of Replacement of Measures by Ranks, and the Estimation of 
p from Rank Correlationi for a Normal Population 

Consider a population with a normal distribution in two variates x and y, 
each of which we shall without loss of generality assume to be of unit variance 
and zero mean. The density distribution is then specified by zdx dy, where 


* R. A. Fisher, Statistical Methods for Research Workers, 4th and 6th editions. Art. 21.1. 







HANK COBHELATION AND TESTS OF SIGNIFICANCE 


41 


(17) 


z = 


2ir Vl - P* 


e *(!-(>’) 


{2 p * v+y’) 


where p is the correlation of x and 2 /, or the variate correlation. By $ and 17 , 
as in §3, we denote the uniformized variates defined by (5), i.e., functions re¬ 
spectively of X and y having distributions of uniform density from — i to + 
Then f and rj will each have the variance 1 / 12 . The rank correlation p' in the 
population is the correlation of { and 17 ; consequently 


(18) 


n oo 

z dx dy , 

00 


Thus p' is a function of p, which obviously vanishes when p = 0 . 
From (17) the identity 


(19) 


dj 

dp 


dx dy 


is readily calculated. With its help we have from (18) and integrations by parts, 

^ = 12 r [ ^v^dxdy = 12 r r dx dy 

dp dp ^ dx dy ^ 


( 20 ) 




d^ dr) 
dx dy 


zdx dy , 


Now since x and y are normally distributed with unit variance and zero means, 
the uniformized variates (5) take the form 


i 


Therefore 


1 

\^2tr 

P -- 
J e ^ dl, 

V = ; 


1 

dr) 

dx 

/- ^ > 

V2ir 

dy ~ 


1 -il 

rj = — 7 = / C ^ dt , 
V27r Jn 


y/2v 


e 2 


Substituting these values and (17) in the last integral in ( 20 ) we have, 
12 


dp 


4ir» \/l 


n « ^ (2 —p^)a;*—2 px 

e" 

00 


y~K 2 ~p*) V* 

dx dy . 


The double integral, as is well known, equals t divided by the square root of the 
discriminant of the quadratic form in the exponent. This gives 



42 


HAROLD HOTRLLINQ AND MARGARET RICHARDS PAB8T 


Therefore, since p' vanishes with p, 



or 


p = 2 sin . 

This is essentially the process used by Pearson. 

The last equation suggests that an estimate r" of p be based on the rank 
correlation r' by means of the relation 


Prefixing a d to denote a deviation of sample from population value we have by a 
Taylor expansion, 


ar" 


_cos-fir + 


f 


the terms dropped being of higher order in 6r' than those written, and conse¬ 
quently of higher order in ^uaring, taking the expectation, and ignoring 
the terms of higher order, we have for the case p = p' = 0, by (10), 


<r*.- = £(8r")* = 



9fn - 1) ' 


approximately. 

The last result enables us to measure the loss of information, at least for large 
samples, that results from neglecting the exact values of the variates and using 
only ranks. The product-moment correlation coefficient r has, if p = 0, the 
exact variance 


1 

n - 1 ’ 

the ratio of which to a ^ tends as n increases to O/tt*. Thus the efficiency of the 
rank correlation method in estimating p, if p is really zero, is 9/ir® = .9119. 
This means that the product-moment correlation is approximately as sensitive 
a test of the existence of a relationship in a normally distributed population with 
91 cases as the rank correlation with 100 cases. 

The efficiency of r' will of course be different for non-normal populations, and 
also for normal populations with p 0. But if the form of the population is 
known, this knowledge may always be used to supplement the ranks to obtain a 
more accura^ estimate of correlation, or test of relationship. This fact deserves 
some attention, since a superficial observation of the coincidence of the formula (1) 



RANK CORRELATION AND TESTS OF SIONIFICANCE 


43 


for the leading term of the variance of an arbitrary uncorrelated population, 
and the leading terra of the formula (10) for the variance of the rank correlation, 
might suggest that r' is as accurate as r. But it may be surmised that the 9 % 
loss of information found for the bivariate normal distribution is the greatest loss 
of information in using r' in place of r to test for independence, since for non¬ 
normal populations the most efficient estimate of the correlation will not usually 
be r, but a more complicated function of the observations. Certainly where 
there is complete absence of knowledge of the form of the bivariate distribution, 
and especially if it is believed not to be normal, the rank correlation coefficient 
is to be strongly recommended as a means of testing the existence of relationship. 

Columbia University. 



THE ELIMINATION OF PERPETUAL CALENDARS 

By John L. Roberts 

If we wish to find the day of the week for any date, one way to solve the 
problem is to use a perpetual calendar. Another way to solve the problem is 
to calculate the day of the week by mathematical methods. In the past these 
mathematical methods have been so complicated that it has been much more 
convenient to use a perpetual calendar. This explains why some people have 
put themselves to the expense of buying perpetual calendars. The purpose of 
this article is to provide a mathematical method which is so simple that the 
entire calculation can be done mentally and which is as convenient as a per¬ 
petual calendar. In this article this mathematical method is applied to the 
Gregorian, Julian, and World calendars. Since a great many records have 
been made using the Julian and Gregorian calendars, the adoption of the World 
calendar would not completely eliminate the usefulness of applying the mathe¬ 
matical method to the historical calendars. The mathematical method also 
shows to what extent the World calendar is a simplification; this is important 
because proposals to reform the present calendar are attracting world-wide 
attention. 

In the theory of numbers occurs the expression, 

a = b mod p, (1) 

which is read a is congruent to b modulo p, and which means that the differ¬ 
ence of a and b is divisible by p. Since p in this article is always equal to 7, 
it is convenient to represent (1) by 

a = 6. (2) 

Assume m stands for any number which represents any monthday of any 
month. Assume w stands for any number which represents any day of the 
week. It is assumed that 7 stands for Sunday, 1 for Monday, 2 for Tuesday, 
etc. It is assumed that the constant c for any month is the value of m at the 
first Sunday in that month. Then (2) becomes 

w ^ m — c, (3) 

which enables us to find w if mis known provided the constant c is known for 
the month in qu^tion. Consequently, all we need to complete our theory is 
to discover a method of finding c for any possible month. 

First, there will be discussed rules for finding c for any month of the Gregorian 
calendar in 1935. An inspection of the calendar shows that c for December is 

44 



ELIMINATION OF PERPETUAL CALENDARS 


45 


equal to 1. Since November has 30 days, we can find c for it by adding 2, 
which is congruent to 30, to the c for December. Since the number of days in 
September, October, and November is 91, which is congruent to zero, the c’s 
for September and December have the same value. In like manner, since c 
for September is 1, the c for June is 2, and the c for March is 3. We now have 
all the theory which is necessary to find w at any date in 1935. For example, 
suppose we wish to find w for April 17, and know that the c for December is 1. 
Then, by adding 2 we find that the c for March is 3. We are now in position 
easily to calculate that the c for April is 7. Applying (3) we find that w at 
April 17 is 3, which stands for Wednesday. 

All that is necessary to complete our theory of the Gregorian calendar is to 
find rules for finding c for December of any possible year, because, if this is 
known, we can find c for any month in that year by the method used for 1935. 
It is convenient to represent the expression, “c for December 1935’^ by “C for 
1935.’* In like manner C for any calendar year means c for December of that 
year. Since C for 1935 is 1 and since the number of days in 1936 is 366, which 
is congruent to 2, subtracting this 2, we find that C for 1936 is 6, because --1 
is congruent to 6. Knowing C for 1936, we deduce that C for 1940, which is 
four years later, is 1, because 6 + 2 is congruent to 1; and that C for 1928 is 2, 
found by subtracting 4. The C’s for 1900, 1928, 1956, and 1984 are equal. 
Full centuries in order to be leap years must be divisible by 400. Since C 
for 1900 is 2, we find by adding 1 that C for 2000 is 3. Knowing C for 2000, 
we deduce by adding 2 that C for 2100 is 5. 1600, 2000, and 2400 have the 
same value of C. If it is assumed that the length of the tropical year is exactly 
365.2425 days, we have all the theory which is necessary to find C for any 
possible year. Although this assumption contains a small error, any further 
discussion of it would hardly be of any practical interest. The foregoing theory 
provides complete methods for finding w by means of a series of steps, which 
are so simple that the entire calculation can be done mentally. For example, 
suppose we wish to find w for November 29, 1888. Each of the C's for 1800 
and 1884 is 7. Therefore, C for 1888 is 2, which is congruent to 7 + 2. Adding 
2 , c for November of this year is 4. Applying (3), we find that w at November 
29, 1888 is 4, which stands for Thursday. In order to calculate mentally w 
for any date of the Gregorian calendar, it is only necessary for me to remember 
the foregoing mathematical method and to remember I was bom on November 
29, 1888, a Thanksgiving Day. 

Deplorable changes were made in the Julian calendar between 45 B.C. and 
1 A.D. Also it was not until 325 A.D. that the use of the 7-day week became 
general throughout the Roman Empire, gradually supplanting the old division 
of the month into Calends, Nones, and Ides. Therefore, in order to save space, 
the application of our theory prior to 1 A.D. is left to the reader. Starting 
with this year it is only necessary to discover a rule for finding the C's of the 
Julian calendar for the full centuries, because the rules of the Gregorian calendar 
apply to all other years. October 6, 1682, Old Style was the same day as Oc- 



46 


JOHN L. ROBEBT8 


tober 16, 1682, New Style; the Gregorian calendar was bom at this date. De- 
cember 17,1600, New Style was a Sunday, and was the same day as December 7, 
1600, Old Style. Therefore, C for 1600, Old Style is 7. It is now a very simple 
matter to complete our theory of the Julian calendar. Since C for 1600 is 7, 
subtracting 1, C for 1500 is 6. 200, 900, and 1600 have the same value of C. 

In the case of the World calendar the c’s for the three months of each of the 
equal quarters can be found as follows. For the first month c is 1. There¬ 
fore, c for the second month is 6, which is congruent to 1 — 3. Subtracting 2 
from this 5, we find that c for the third month is 3. 



NOTES 


ON STANDARD ERROR FOR THE LINE OF B4UTUAL 
REGRESSION 

By Y. K. Wong 

1. In Pearson's On Lines and Planes of Closest Fit to System of Points in Spacey 
he establishes a formula for the mean square residual for the best fitting line in 
g-space: 

(1) (mean sq. residual)^ = ^ 

where 2/2max is the length of the maximum axis of the correlation ellipse in 
g-space, and A is the correlation determinant.^ 

In the present paper, we consider a 2-dimensional case, and shall call the mean 
sq. residual as the standard error, denoted by Sn, 

In 2-dimen8ional space, a correlation ellipse is 

(2) 0 x 2 ^ 2hxy + -f c = 0, 
where 

(2a) a = al, b = al, h = = -p„ = c = -o’a*. 

Pearson gives in the 2-dimensional space the following formula for Sn : 

(3) (Sat = ff*<Tv/semi-major axis of equation (2). 

Expression (3) can be readily deduced from (1). This paper aims to present 
some formulae for Sn, more convenient for practical computation, and also call 
attention to a misprint in Pearson’s paper. 


2. From analytic geometry, we see that the angle <p, between the major axis of 
the ellipse (2) and the i-axis is given by 

(4) tan 2»j = 2h/(a — b). 

By rotation of the axes, equation (1) can be written in the form 

(6) o'x® + + c = 0, 


where 

(5a) 


a' = a-cos*v — 2A-sin^-cos^ — 6-sin*^ > 0 
b' = o-sin*^ — 2A-sin^-cos»> — 6-cos®^ > 0. 


‘Philosophical Magazine, 6th Series, II (November, 1901), p. 569. 

47 



48 


Y. K. WONG 


Leuha 1. The value of a' given by (5a) is less than b'. 

To prove this lenuna, we find from (4) and (5) 

o' — 5' = o + 5, o' — 6' = 2Vsin 2^ = — 2pcy/8in 2^, 

and hence 

(6) 2o' = o + b — 2psv/sin 2((>, 26' = o + 6 + 2p^/8in 2^. 

Since both a and 6 are positive, the lemma will be proved if we can show that 
pev/sin 2^> is a positive quantity. By (2a), = fx^tOy, in which a„ <jy are 

positive; hence the sign of p depends upon the sign of r. If r^y < 0, then <p >Z, 

3‘jr 

and 2^ is of such a nature that — < 2<p < 2 t. It follows sin 2^ < 0, and 

hence p*»/8in 2^ is positive. On the other hand, if r,y > 0, then <p is such that 
0 < 2^ < T, and hence sin 2^ > 0. It follows that p»v/sin 2^ is positive inde¬ 
pendent of the sign of 

Lemma 2. The square of the mean square residual is equal to a', and hence 
Sff = aJ cos^ <p — 2p*v sin ^ cos ^ + al sin® <p = — o\) — pj^/sin 2^. 

For from (5), we obtain (semi-major axis)® « —c/a' == + Substituting 

a 

this into (3), we obtain Sn = o'. The balance of the lemma follows from (5a), 

(6) , and (2a). 

Lemma 3. For every r^, we have 

(7) sin 2^ = Vxyly/K, K = - aj)* + 4ply. 

By the argu¬ 
ment given in the demonstration of Lemma 1, we see that r^y and sin %p should 
be of the same sign. Hence the negative sign is chosen before the radical. 
From Lemma 2 and (7), we have the formula given by Pearson: 

(8) 2Sl = {cl + oD* - VK. 

3. We are going to establish several more formulae for Str. From (4), we 
have 26-tan {(p) = — (o — 6) ± -^/K. The sign before the radical is deter¬ 
mined in such a way that tan (^) has the same sign as r^y. By the reasoning 
given in Lemma 1, the negative sign is chosen. Thus 

—2p,ytan <p — (»: — <fl) — y/K = -■ 2 ffj 

or 

2(tf J - p^ tan = cl - al - \/K, 

This proves that 

(9) 5^ = ffj - p,y tan^. 


For, from (4), we find sin 2«!i = —pty /± -y/K = r. 


.±Vx> 



STANDABO XBBOB FOB LINE OF lOTTUAL BBOBBB8ION 


49 


Similarly, we have 

(10) cot V. 

For computation, (9) and (10) are more convenient than (8). When the line 
of mutual regression is determined, it is known that tan ^ (denoted by B) is 
equal to the slope of that line, and hence cot y? (= 1/B) is equal to the reciprocal 
of the slope. Then we can write (9) and (10) as follows: 

( 11 ) Vy. B 

( 12 ) SI vJB. 

The second formula given in Lemma 2 is simpler than (8), but not as simple as 
(11) and (12). 

For computation, it is convenient to find ^ from the equation 
tan2y = 

or,— (Ty 


i.e., 

2ip = arctan H. 

Since sin %p and r*v are of the same sign, we can determine the value of <p from 
the preceding equation by inspection, though arctan H is a multiple-valued 
function. After the determination of ip^ we can obtain 

B == tan (p. 

Then we can compute Ss either from (9), (11), or (10), (12). 

There is a very interesting fact furnished by (11) and (12). These two 
formulae are, in fact, generalizations of the following two well known ones: 

(a) Si = aid - r) 

(b) Si = old - r), 

where Sy is the standard error of the line of regression when y is used as dependent 
variable and x as independent variable, and similarly for Sx* It is clear that the 
line of mutual regression may be looked upon as a generalization of the other 
two lines of regression when we use y or x as dependent variable. So the slope 

B of the line of mutual regression is a generalization of byx = and bxy =* r— • 

(Tx (Ty 

where the subscript yx means y on i and xy, x on y. If we use x as independent 
variable, then we must obtain instead of B. Hence substituting the formula 
of hy, instead of B into (11), we obtain, after a simple reduction, the same result 
as given by (a). On the other hand, if we use y as independent variable, we 
must obtain instead of 1/B. It will result (b) when b^y is put in the place of 
1/B in (12). The generalization perhaps can be seen more clearly if we write 
(a) and (b) into slightly different forms: 



50 


Y. K. WONG 


(a') s; = ffj - 

(bO Sl^ol- p^.h„. 


4. The misprint in Pearson^s paper is on the second formula of the following: 


iMSRy = ^ = 1 (<r* - - V{cl - air - ^raUl) 


where tan 2ip = 2rmyOxOyl{al — o\), cot^^ should read “square of semi-major 
axis of ellipse (2).” Professor Henry Schultz first noticed this misprint and 
suggested to the writer to investigate it. 

In a recent letter to Schultz, Pearson pointed out that one of the simplest 
formula for or (MSRy is given by 

(a) Si = al sin^ (p + ol cos^ <p, 


where <p is defined by (4). However, Professor Schultz expressed doubt about 
its validity. From lemma 2, it is clear that (a) is also not true. 


Institute of Social Sciences, 
Academia Sinica, Peiping 



THE DISTRIBUTION LAWS OF THE DIFFERENCE AND QUOTIENT OF 
VARIABLES INDEPENDENTLY DISTRIBUTED IN PEARSON 

TYPE in LAWSi 

By Solomon Kullback 

Although the results herein described are not entirely new, it is felt that the 
method of solution Ls of interest as presenting further illustrations of the applica¬ 
tion of characteristic functions to the distribution problem of statistics (I). 


1. Distribution law of the difference. Let u = j — ?/, where the distribu¬ 
tion laws of X and y are independent and given respectively by 


(1) 


V ; h\y) = 




r(p) riq) 

The characteristic function of the distribution law of u is given by (I), 


( 2 ) 


(3) 


f°° dx r 

- J. m I 

1 


’ dy 


r(g) 


(1 - ay (1 + uy 

The distribution law of u is given by (f), 


(4) 


Let 1 — t< = 


s/_I(rr 


e-<‘" dt 


ity (1 + ity 


u 


(5) 


2 « 2 « 


e”* dz 


Now it may be shown that (I) 

e”* dz 






Q-P 

6“ (2m) * 
Tip) 


W„-. i_„_, (2w) 


1-p—g 
2 ’ 2 


^ Presented to the American Mathematical Society, June 20,1934. 

51 



62 


SOLOMON KULlBACE 


where is the confluent hypergeometric function (f). Since TT*.«(«) »> 

Wk.-m{z) we have finally 

(7) D(«) = 4^— E±2:ii • 

~o~ 2 ’ 2 

2 r(p) 

For p = ?, since TFo. m{2x) = —^ where K^{x) is the Bessel Function 

V TT 

of the second kind and ima^ary argument (f), we obtain 

2p-l 

(8) D{u) = hi). 

2 ' r(p)v/^ 

This result has been otherwise obtained by Pearson, Stouffer, and David (S). 


2. Distribution law of the quotient. Let u = log x — log y where x and y 
are defined as above. 

The characteristic function of the distribution law of u is given by (f) 


(9) 


<p{t) 


_ f"" e~* dx r 

Jo r(p) jo 


e-vy 




dy 


r(?) 


( 10 ) 


r(p + it) T{q - it) 
Tip) Tig) 


The distribution law of u is given by (1) 


(11) Diu) r e-’‘“r(p + f<) Tiq-it) 

2w Y(p) Tig) 

Let g — it = — z, so that 


(12) Bin) 


g-*U 


r(p) r((?) 2iri J-t 

Now it may be shown that iH) 


/ -q+i» 

«■' 


Tip + ? + z) r(-z) 


dz. 


Jl_ [->+'* 

2'3rZ J— q— i 00 


e-f“ r(p + ? + z) r(— «) dz = r(p + g) (l + «r“)“<>^) , 


Diu) 


Tjp + g) e»*“ 

r(p) r(g) (1 + c“)'+« 


so that 
(13) 



DISTRIBUTION LAWS OF DIFFERENCE AND QUOTIENT 


53 


Since e“ = - * w, we obtain as the distribution law of the quotient 

y 


(14) 

If in (13) we set 


p(^) = TSv +j) ■ _ 

r(p) T{q) (1 + u))»>+« 


we obtain 


(15) 


« - . 
P 2 ' 


I>(^) 


^ 2 


e“ = — e** 

Wi 








(na + niC^O ^ 


which result has been otherwise obtained by R. A. Fisher (4). 

George Washington University. 


REFERENCES 

(1) Kullback, S.; An application of characteristic functions to the distribution problem 

of statistics. Annals of Mathematical Statistics, Vol. 5 (1934) pp. 263-307. 

(2) Whittaker and Watson: Modem Analysis, 2nd Ed., pp. 333, 283. 

(3) Pearson, Stouffbr and David : Further applications in statistics of the Bessel Func¬ 

tion. Biometrika, Vol. 24 (1932), pp. 293. 

(4) Fisher, R, A.: On a distribution yielding the error functions of several well known 

statistics. Proceedings International Mathematical Congress, Toronto (1924), 
Vol. 2, pp. 805-813. 



REPORT OF THE MEETING OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS AT ST. LOUIS 

The Institute of Mathematical Statistics held a joint meeting with the 
Econometric Society and the American Mathematical Society at St. Louis, 
Missouri, on January 2, 1936. The program consisted of an invited address, 
“The Mathematical Theory of Index Numbers,” by Professor Thomas Rawles, 
and the following additional papers: 

(1) On Certain Distributions Derived from the Multinomial Distribution, 

by Dr. Solomon Kullback 

(2) Convexity Properties of Generalized Mean Value Functions, by Dr. 

Nilan Norris 

(3) The Frequency Distribution for the Mean of n Independent Chance 

Variables When Each Is Subject to the Law j/oa:’’“*(l—by Prof. 

W. D. Baten 

(4) On the Admissibility of Time Series, by Prof. Francis Regan 

The Institute voted to hold a meeting at Cambridge, Massachusetts, early 
in September of this year. This meeting will be in connection with the celebra¬ 
tion of the Harvard Tercentenary. Professor R. A. Fisher will deliver an 
invited address before the Institute and the American Mathematical Society. 
A more detailed announcement of the meeting will be made later. 


54 



SHEPPARD’S CORRECTIONS FOR A DISCRETE VARIABLE 

Cecil C. Craig 


In the Annals of Mathematical Statistics,^ J. R. Abemethy gave a derivation 
of the corrections to eliminate the systematic errors in the moments of a discrete 
variable due to grouping. It is the purpose of this note to considerably shorten 
and simplify the derivation of these corrections by an adoption of a device used 
by R. A. Fisher (not published so far as I know) in the case of the ordinary 
Sheppard^s corrections. 

Let us suppose that m consecutive values of the discrete variable in question 
are grouped in a frequency class of width k. The m smaller intervals of width 
k/m go to make up the class width kj the actual points representing the m 
values of the variable being plotted at the centers of the sub-intervals. Now 
let us suppose that each of m consecutive boundary points of the sub-intervals is 
as likely to be chosen as a boundary point of the la'rger interv^als as any other. 
Then, if is the class mark of the i-th frequency class, for any true value, of 
the discrete variable included in this frequency class, we have 


X, = or + € 

in which x and e are independent variables and c takes on the m values 


m — 1 , / m — 3 j , 


m — 3 j , m — 1 , , 

—^ k/m, k/m, 


with the equal relative frequencies 1/m, 

The moments of x, are those calculated from the grouped frequency distri¬ 
bution; the problem is to express the average values of the moments of x in 
terms of the calculated moments and k and m. The use of moment generating 
functions at once leads to the desired results. Denoting the 5-th moment of x, 
about any origin by the like moment of or by the respective moment 
generating functions of the two variables by MxiW and MxW respectively, 
w e have at once 


( 1 ) 


Mx^W = MxW 


s 

m—1 


g.t» 

m 


y 


^ “On the Elimination of Systematic Errors Due to Grouping,“ vol. IV (1933), pp. 
263-277. 


55 



56 


CECTL C. CRAIG 


in which by definition 

= 1 + + n *>V2! + vz t>V3! + • • • , 

Mm{d) = 1 + + Mi t^V2l + Ma t>V3I + • • • . 


The computation necessary to get the actual corrections consists in the calcula¬ 
tion of the coeflScients in the formal expansion of 


( 2 ) 




m—l 

2 


k/m 



£!! 
m ^ 


in powers of i> and then solving for the ’s in (1). 

But the summation indicated in (2) is readily effected by means of the calculus 
of finite differences. In fact, we get 


— 2 — kv/m -— kd/m 

(3) M (&) = ^ U/2 

— 1) m sinh kdl2m 


Then (2) becomes 
(4) 


M^{^) = MM) 


sinh kd/2 
m sinh k^l2m ‘ 


If we let m oo we get the corresponding result for a continuous variable 


(5) U„W . MM 

already given by Langdon and Ore,^ though in a less elegant manner; for in this 
case, the expression analogous to (1) is immediately seen to be 

/ k/2 
■k/2 

Returning to (4), taking the logarithms of both sides, remembering that the 
logarithm of the moment generating function is the generating function of the 
semi-invariants of Thiele, we get, 


Xit? + Xi t>V2! + Xs t>V3! + • • • 


( 6 ) 


= Xit? + h t>V2! + Xa t>V3! + - log 


k^/2m sinh k^/2 
k^/2 sinh k^/2m ’ 


in which the Xr^s are the calculated semi-invariants and the Xr’s the corrected 
ones. 


* W. H. Langdon and O. Ore, Semi-invariants and Sheppard’s Corrections, Annals of 
Mathematics, vol. 31 (1930), pp. 230-232. 



sheppard'b corrections for a discrete variable 


57 


But since 


log 


sinh X 


= 2 (-!)•« 


B. 


■ -1 


2b (2«)! 


(2x)*' 


we have on setting: 


(7) —log iTTTTr-T-T—rTToZ: — ^ ^ t^/2! + as i^V3! + 


fci^/2 sinh k6l2m 

Oo = 0, 04,4-1 = 0, 


( 8 ) 


02, = 


(- 1 ) 


\yB.k^^( 1 \ 

2s \ m*V’ 


s = 0, 1, 2, •.. 


s — 2y 3, • • • . 


Obviously these o’s are the ^^Sheppard’s” corrections for the semi-invariants. 
We have generally 


Xs,-!-! = Xst-j-i, 


s = 0, 1, 2, ... 
B. 




In particular 

X, = X* - (l - 1) 

^^ = ^‘ + (l-;^)*Vl20 


A:V12 X. = X, - ^ k*/252 

X, = x* + (l-^,)fc«/240. 


For m —> 00 , these give of course the results reached by Langdon and Ore.* 
To get the corrections for the moments let us set 


m sinh fci?/2m 
sinh ki^/2 

From (7) and (8) 


= Oo + ait? 4" 0^2 t?V2! + ocs t?V3! + 


Oo = 1, a 2 n+i = 0, n = 0, 1, 2, • • 

(9) ^ yA (2n) ! aialai • • • 

^ (2!y(4!)*(6!V ••• r!s!M ••• 

the summation extending over all positive, integral values of r, s, . for which, 


r 4* 25 + 3< + • • • = n. 


* Loc. cit. 



58 


CECIL C. CBAIG. 


Then finally we have the formula, 

[f],, 

(10) 

• ■■O' * 

for the corrected moments. 

Writing out the first four a^s, we have for the first eight moments about the 
mean 

/Ltl = J'l = 0 

^ - (1 _ l/w 2 )fcV 12 

M8 = ^3 

_ (1 _ l/m 2 ) v 2 fcV 2 + (1 - - 3/m2)fcV240. 

^,5 = 5(1 - l/m^)vzky6. 

_ 5(1 _ 1/^2) ,;4/cV 4 + (1 - l/w2)(7 - 3/m2) »^fcV16 

- (1 - l/m2)(31 - 18/m2 + 3/m0fcV1344 

^7 = - 7(1 -- i/7yi2) j,^fc2/4 + 7(1 - l/m2)(7 ~ 3/w2) i/3fcV48 

= V 8 - 7(1 - l/m^)ptky3 + 7(1 ~ l/m2)(7 - 3/m2) v,ky24: 

- (1 - l/m2)(31 - 18/m2 + 3/m*)) »^fcV48 

+ (1 - l/m2)(381 - 239/m2 + 55/m* - 5/m«) fcV11520. 

The final term in Ai 2 n as given above is a 2 n . 

The above method is readily extended to the case of two or more variables. 
We will illustrate the procedure by getting the results likely to be required for 
two variables. As before we suppose that m consecutive values of x are grouped 
in a frequency class of width kj and we shall similarly suppose that n values of y 
are grouped in a frequency class of width L And arguing as before we write now 

Xt = X + € 

Vi = y + V 

in which c and 77 are independent of x and y and of each other. 

The moment generating function of two variables is defined by the identity 
in T? and co: 

Mx.vi'O-, oj) = 1 4" (^1 |)t? 4“ /xq 1 ^) 4" ^ (m2o^^ 4” 2/xi 4" Mo 2^^) 4" * • * 

1 + (mi 0^^ +• Mo iw) + (mi 0 ^ + Mo 4“ (mi 0 ^ 4" Mo 1 ^)^’^ 4“ • * * > 
in which the manner of expansion of (m 10 4- Mo is evident. 



Sheppard's corrections for a discrete variable 


59 


Then from the properties of moment generating functions, we have 


n—l 


k/m 


l/n 




( 11 ) 


.n—l , , n — l , , 

-— k/m If--— l/n 


mn 


— M ^ sinh A;t?/2 sinh Za)/2 

^ m sinh Art?/2m n sinh Jw/2n * 


As in the case of a single variable it will be simpler first to get the corrections 
for the semi-invariants. The logarithm of the moment generating function 
is the generating function of the semi-invariants; thus 

log Af*,y(t?, oi) = (X]ot? + Xlow) + ^ (Xiot^ + ^ (Xiot? + 4~ * * * > 


in which 


(Xiot? + Xoiw)^^^ = Xsot^^ + 3X2lt^^t*) -|- 3Xi2t^C*j2 -|- XoiO)*, 


etc. 

We write (see (7)), 


( 12 ) 


with 


, w sinh fct>/2m , oi/^i , 

'=* Mnh M/2 - + ■ 

, nsinhiw/2n , -,01 , i. i/At , 

sinhfa.'2 -l^"V2!+l'..»V41+ • 


«. - (1 - 1/m-) 


62 .= 


2r 

2s 


(1 — l/n^O. 


(13) 


Then from (11) we have 

(Xiot!^ + Xoiu))( 2 h-i) = (Xiot^ + ^ s = 0, 1, 2, • ■ 


(Xiot? + Xoiw)<**> = (Xiof> + + b2acc!^, ;s = 1, 2, 3, • • • , 


in which, of course, K$ is a calculated semi-invariant and Xr, a corrected one. 
We read off 

Xr« == Xr* , rS 0 , 

as already shown by Wold in the case of continuous variables,^ 

X2*+l,0 = X2*+l,0 , Xo, 2«+l ~ 2«+l . 

^Herman Wold: Sheppard's Correction Formulae in Several Variables: Skandinavisk 
Aktuarietidskrift, vol. XVII (1934), pp. 248-255. 



60 


CECIL C. CRAIG- 


The values of Xi,.o are the same as those for \u given above and those for Xo, u 
are obtained from these merely by replacing in them m and fc by n and 1. And 
it is quite obvious that for any number of variables the only semi-invariants to 
be corrected are those in which a single figure of the index is different from 
jBero and is moreover even. For such semi-invariants the corrections are 
naturally those derived for a single variable. 

Now to derive the corrections for the moments, we write 


w sinh fcty/2m n8inhL)/2n ^ ^j/si(..#W) + i/ 4 !(a,»W) + .- 
sinh k^/2 sinh Zw/2 


with now, 


= 1 + 1/2! (ajod^ + + 1/4! (acjod^ + + • • • > 




^ (2h) I (oa + (<^4 + W* • • * 

^ (2 !)•• (4 !)• • • • r! s ! • • • 


the summation to be over all positive integral values of r, s, • • • for which 

T 2s -|— • • • = 


and in which the parameters d and may be omitted without ambiguity. 
The formula for the corrected moments can now be written 


(14) (nio + ^ (ofjo + a(02)<«> (j-i'o + • 

g-0 \^/ 

This gives 

Mio + Mol = *^10 + ^01 

(mio + Mol)® = ("lo + »'oi)® + («» + am) 

(mio + Moi)® = (>'10 + Voi)® + SCvIo + >'oi) («» + am) 

(/<(o + Moi)® = (>'10 + »'oi)® + 6(>'io + >'oi)® («» + aw) + (oM + oot)® 


Noting that, 

(“M + aoj)®^ = 04 + 64 + 3(a* + ^i)* I 

we get the following formulas for the correction of the product moments about 
an arbitrary origin: 

Mil = "ii 

M»i = "ii - (1 - l/ni“)>'oi fcV12 

Mti = ’'It — (1 — !/«*) "10 P/12 

M»i “ >'»i — (1 - i/m*) v'ti fcV4 

Mtt^ yti- (i- i/m*) y'to P/12 - (1 - 1/n*) v'o, k*/12 

- (1 - 1/m*) (1 - l/n») A;*P/144 

Ml* = »'i» — (1 — 1/w*) •'ll P/4 . 




SHEPPARD CORRECTIONS FOR A DISCRETE VARIABLE 


61 


The above results give the corrections for moments about the mean, merely 
by dropping the primes and setting = 0 . In practice the corrections 

needed are for moments about the mean, and though there would be no difficulty 
in computing additional results for an arbitrary origin, I shall giver here only the 
additional results for moments about the mean through the sixth order, omitting 
those obtained merely by permutation of subscripts and interchange of k and m 
with I and n respectively. 

First, the necessary extension of (15) is 

(mio + = (i^io + + 10 (i^io + v^iY^^ (a20 + «(») 

(15) (;uio + = (>^10 + -f“ 15 (j'lo + (oc 2 o + o^oa) 

+ 15 (vio + mY’^^ (aao + ooa)^^^ + (aao + ooa)^*^ . 

We need the additional relation: 

(ajo " 1 “ == ^6 4 “ ^6 -f- 15((i4 -f" ^ 4 ) ((I 2 4“ ^ 2 ) 4" 15 (u 2 47 ^ 2 )^ • 

The additional formulas for product moments about the mean follow: 


M41 = 

Pil — 

(1 - 

1/m*) vii fc*/12 



= 

vsi *- 

(1 - 

1/n*) ., 0 IV2 - (1 - 

• 1 /fn^) Pn A:V4 


m = 

— 

(1 - 

1/m*) Si-si A:V6 + (1 

CO 

1 

1 

00 

/142 = 

Pi2 — 

(1 - 

1/n*) 1/40 iV12 - (1 ■ 

- 1/m*) J/M fc*/2 






+ (1 - 1/m*) (1 - 

- 1/n*) I'M )b*Z*/24 

+ (1 

- 1/m*) (7 

- 3/m*)»/wfcV240- 

(1 - 1/m*) (7 - 3/m*) (1 

- 1/n*) fc‘P/2880 

At33 = 

J^83 — 

(1 - 

1/m*) VJ 3 kV4 - (1 ■ 

- 1/n*) vii P/4 



+ (1 - 1/m^) (1 - l/n2) fc2p/16 . 


For m and n infinite these results give the formulas for two continuous vari¬ 
ables already found by Baten® and Wold.® 

The reader will note that this development does not impose the ‘^high contact” 
condition, except in so far as it assumes the existence of the moments that 
occur in the formulas. And it exhibits in the clearest fashion that Sheppard^s 
corrections are corrections on the average. 

University op Michigan. 


* W. D. Baten : Corrections for the Moments of a Frequency Distribution in Two Vari 
ables; Annals of Mathematical Statistics, vol. II (1931), pp. 309-319. 

• Loc. cit., p. 263. 



FUNDAMENTALS OF THE THEORY OF INVERSE SAMPLING^ 

By Ching-Lai Shen 
Part I. Introduction’’ 

Section I. Statistical Concepts of the Theory of Sampling 

One of the chief objects in statistics is to form a judgment of a very large 
statistical universe, known as a parent population, by means of a study of a part 
or sample thereof, which is drawn at random. To make a complete survey of 
the parent population is sometimes impossible or impractical. For example, 
it is impossible to measure the heights of all adult persons in a country. It is 
impractical to test for infectious bacteria the whole body of water in a city 
reservoir. All that we can do is to obtain an unbiased sample. By an unbiased 
sample, we mean a sample in which each individual has an equal and independent 
chance to be included. From this chosen sample we attempt to draw some con¬ 
clusion concerning the nature of the whole parent population in accordance with 
certain mathematical principles. 

Now the sample which we choose is of course only one of the samples that can 
be possibly drawn from a given parent population. Suppose there is a popula¬ 
tion of 8 individuals from which we wish to choose a sample of r. It is clear 
that there exist ,Cr such samples, each of which is equally likely to be chosen. 
Therefore these ,Cr samples constitute the so-called distribution of samples. 
To describe from the statistical point of view the distribution of samples, we 
must find its mean, standard deviation, skewness, excess, and other higher 
characteristics. The first three are usually referred to as elementary statistical 
functions. 

Suppose Xi be the variate (by which we mean the magnitude of a specified 
character of an individual to be measured) where f = 1, 2,3, • • • s; and «y be the 
samples chosen from the parent population where j = 1, 2, 3, • • • .Cr. Then 
the ,Cr samples, each consisting of r variables, will be formed after the following 
fashion: 

= X1 -t- X, -I- X» -t- • • • -f Xr 

Zt = Xt + Xz+Xi+---+ Xr+l 


— Xt — r+l “b 3:»_r+t -b Xt_r44l “b ' ' ' "b 

‘ A dissertation subniitted in partial fulfillment of the requirement for the degree of doc¬ 
tor of philosophy in the University of Michigan. 

* The writer wishes to express his appreciation for the assistance Professor H. C. Carver 
has given him in making this study. 


62 




FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


63 


If we denote the nth moment of the parent population about its mean by 

E in - 


nn:x ““ 


and the nth moment of the distribution of samples about its mean by 

(;) 

E (2y - M.y 


Mn;. = 


C) 


and if we then utilize the multinomial theorem, we may be able to express the 
sample moments in terms of the moments of the parent population:* 


( 1 ) 


Mx = rMx 


M2:z = 2!|P2 
M3:* = SljPa 


S^2:x 

2 ! 

SM8:x 

3! 


- n2 2 ~2 

A 1 J I P 2 ^ M2:i 

= 4!<;F4-n- + 


4! ^2! (2!)2 


M6:. = 5! <i Ps + P»P^ 


5! 


3!2! 


- _ ciJo , J3 J3 S^M4:xM2:x 

M.:. - 6! |P, + P4P2 4,-3, 

+ etc 

^ 2! (3!)* ^ 3! (2!)» ’ 

where P„ is obtained from the sampling polynomial Pn(p) by writing p* as p,: 


( 2 ) 


'pl(p) = p 
Psip) = P — 

p^ 



Psip) = P - 

3p*-f- 

2p» 


Psip) = P - 

7p* + 

12p* - 

6p* 

Psip) = P - 

16p* + 

50p» - 

mp*+ 24p‘ 


tP,(p) = p - 31p» + 180p* - 390p‘ 4- 360p» - 120p«, etc. 


where 


Pi = 


r(r — l)(r — 2) ■ ■ • (r — j — l) 
«(« — 1)(8 — 2) • • • (s — i — 1) 


• Carver, H. C., Annals of Mathematical Statistics, Vol. I, No. I, pp. 106-107. 



64 


CHING-LAI SHEN 


Section II. Frequency Curve of the Distribution of .Samples 

The frequency distribution of samples is usually less scattered than individual 
observations. In order to ascertain the manner of the distribution, we have 
access to the well-known Type A Curve of Charlier."* 

(3) Fit) = 4>it) - H + • • • 

1 zll 

where = “ 7 = e 2 


C 3 = as 

C 4 = a4 — 3 

C6 = as — lOas 

cx^ — 15a4 + 30 
c^ = a? — 21a6 “i- lOas 
Ca = as — 28a6 + 210 a 4 — 315, etc. 

This formula is a powerful tool for representing any frequency; but it is 
emphasized by more than one author^ that the usefulness of such a series repre¬ 
sentation of a frequency distribution depends upon the rapidity of convergence, 
and the rapidity of convergence in turn depends upon the extent to which the 
function is a fair approximation for F{t), We shall not, however, discuss 
here the question of convergence. What we are interested in is to apply this 
series representation to the distribution of samples and see whether our numerical 
experimentation justifies the use of it. 

TABLE I 


Heights of 1000 Freshman Students 
(Original Measurements Made to Nearest 0.1 in.) 


Class 

Frequency 

68.5-60.4 

2 

60.5-62.4 

13 

62.5-64.4 

76 

64.5-66.4 

167 

66.6-68.4 

335 

68.5-70.4 

264 

70.5-72.4 

106 

72.5-74.4 

20 

74.6-76.4 

7 

76.6-78.4 

1 


* Camp, B. H., The Mathematical Pari of Elementary Statistics^ p. 226. 

* H- Tj.. Mathemniirnl Statistics 62 . 

Carver, H. C., Frequency Curves, Handbook of Mathematical Statisticsy p. 116. 




FUNDAMENTALS OF THEOBT OF INVERSE SAMPLING 


65 


First of all, therefore, we take for our numerical example the heights of 1000 
freshman students in the University of Michigan, as recorded in Table I, which 
are assumed to constitute our parent population. 

From the above data we compute the first 6 moments as follows: 


M, = 67.91 

MJ:, = 6.279,068 

M«:, = 0.489,552 

M4;x = 132.685,214 

Ai6;x = 78.435,794 

= 4574.080,554 


<r. = 2.505,81 
0,;x = 0.031,11 
ct*;! = 3.365,36 
at,,. = 0.793,92 
a«:x = 18.476,43 


Now suppose from this parent population in which s = 1000, we wish to 
choose loooCioD samples, each consisting of 100 individuals. To characterize 
the distribution of these samples, we first make the following table: 


TABLE II 

Values of p, and Ptfor s = 1000, r = 100 


Pi = 

.1 

pa = 

.009,909,909,91 

Pa = 

.000,973,117,406 

P4 = 

.000,094,676,417,6 

P6 = 

.000,009,125,437,84 

P6 = 

.000,000,871,272,959,5 

Pi = 

.1 

Pi = 

.090,090,090,09 

Pi = 

.072,216,505,082 

Pi = 

.041,739,980,994 

Pt= - 

.005,454,352,918 

P« = - 

.065,789,272,230 

P| = 

.008,058,351,516 

PiPi = 

.006,472,571,500 

PiPi = 

.003,764,792,358 

PI = 

.000,715,593,194 

PI = 

.005,195,978,741 


Substituting into formulae (1), we obtain the first six moments of the distri¬ 
bution of samples: 


M, 

s 

‘6791 



fklM 

3r 

565.621,622 

(T, = 

23.782,8 

iklM 

ss 

35.353,734 

08;* = 

.002,628 

M4:* 

=: 

958,720.852,854 

04:* = 

2.996,679 

P6:« 

= 

198,538.702,142 

Oi:» = 

.026,093 

Me:* 


2,704,514,780.791,465 

06:* = 

14.945,539 





66 


CHINQ-LAI S^EN 


The coefficients of Charlier’s Type A Curve turn out to be very small and 
rapidly decreasing: 


I? = .000,438 

ol 

= - .000,138 

£»= -.000,016 

51 

^ = - .000,006 

6! 

We therefore may be justified in considering this series representation of the 
sample distribution as converging rapidly to the normal curve. It may be 
interesting to note that even from a parent population which is very skew, the 
distribution of samples is nearly normal—as the following example will show; 

TABLE III 

Weights of 1000 Freshman Students 


(Original Measurements Made to Nearest Pound) 


Class 

Frequency 

85- 

1 

96- 

8 

106- 

45 

115- 

132 

125- 

232 

135- 

244 

145- 

161 

156- 

97 

165- 

50 

175- 

16 

185- 

7 

195- 

3 

205- 

4 


M, = 139.32 

/i*;, = ' 296.8343 

fia:, 3,230.802 

/l4:. = .361,180.14 

fi*:, = 11,811,480.5 
ik-.z = 886,586,271 


(T, = 17.228,87 
oti-.z = 0.631,74 
ou-.z = 3.985,67 
a*:. = 7.780,71 
at,z = 33.898,36 



FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


67 


M, 

5S= 


13,932 





Jk:M 

S= 


26,741 

828,829 

<7, = 

163 

529 

M8:* 

= 


233,317 

229,045 

08:# = 


05334 


= 2 

,144 

,736,851. 

477,805 

0^:^ = 

2 

.9991 

M6:f 

= 62 

,008 

,368,279. 

121,883 

ah.z = 


.53024 

M6:* 

= 287,107 

,828 

,746,809. 

017 

06:* = 

15 

.00633 


I? = .008,89 

o! 

^ = - .000,04 

4! 

^ = - .000,03 

5! 

= .000,03 

Indeed the distribution of samples, in general, is very nearly normal irre¬ 
spective of the law of distribution of the parent population. From the practical 
point of view, as Professor H. C. Carver has remarked, the parent population has 
little control over the shape of the distribution of the samples of r is fifty or 
greater and if >S is at least ten times as large as r.® 

Now as a numerical illustration of the theory of sampling I may, for example, 
choose at random 100 weights from the parent population of 1000 weights of 
freshman students, as recorded in Table III, with the aim of ascertaining the 
probability that the mean of this sample exceeds 142 pounds. 

Since we define the mean of a sample simply as the average measurement of 
the r individuals in the sample, which in this case is 100, it therefore follows that 
the ordinary moments of the distribution of sample means differ from those of 
the distribution of samples in (1) only by a constant multiple of 1/r* where k is 
the order of the moments concerned, while the standardized moments remain 
unchanged. Therefore in this problem, we have the mean of the sample means 
equal to 139.32 and the standard deviation equal to 1.63529. The average 
weight, 142 pounds, may be expressed in standard units as 


2 - M. _ 142 - 139.32 
(T, 1,63529 


1.63885 


In accordance with (3), the probability that the mean of the sample exceeds 
142 pounds is therefore equal to 

P = [0(0 - I? + || - g ^<««) + • • •] 


• Carver, H. C., Annals of Mathematical Statisticsy Vol. I, No. I, p. 112. 




68 


CHINChXJLI 8HEN 




.05062. 


If we take the first term only, P * / 

Jl .61881 

If we take the first two terms, P = / <l>{t)di — .00889^^*^01 

.68886 Jl. 

.05218. 

If we take the first three terms, 


68886 


P = / <i,{t)dt - .00889<<)(*>(01" + (-.000,04)<#»(»>(ol* = .052182. 

y 1.68886 J 1 . 6 S 886 J 1 . 6 S 886 

Section III. Pearsonian Types of Curves 

Charlier^s Type A Series is, however, not the only known analytic representa¬ 
tion of a frequency distribution. There are Pearsonian Type& of Curves, the 
characteristics of which I shall need to summarize briefly. These Pearsonian 
Types of Curves are essential to the later development of our theory. 

The curves, suggested by certain geometrical properties of unimodal frequency 
distribution, are all obtained from the solution of the differential equation: 


1 ^ 

y dt 


a — t 


fit) 


where f(t) is assumed to be possibly expanded into a convergent power series, 
that is, fit) ^ bo bit + bijP + • • • . When the first three terms of the power 
series are taken, the differential equation immediately takes the form of 
1 dy a — t 

y dt bo + bit + 62 ^*’ 
terms of moments:’ 


The parameters, o, 5o, may be expressed in 


a— 

2(1 + 25) ^ ” 2(1 + 26) 

r __ 08 , b 

‘ 2(1 + 25) 2(1 + 25) 

where 

j 2«4 — 3al — 6 

*-STs— 

Based upon the difference in the nature of the roots of the equation 
bo + b^ + bj!^ — Of there have been derived thirteen types or curves. Of the 
particularly noteworthy ones, the normal curve and Type III may be men¬ 
tioned. The criterion for the normal curve is as = 5 = 0; that for Type III is 


^ Carver, H. C., Frequency Curves, Handbook of Mathematical StatiBticBf p. 104. 



FUNDAMENTALS OP THEORY OP INVERSE SAMPLING 


69 


6 « 0 and as 9^ 0. In order to fix the form in a particular case, we may refer 
to Pearson’s Chart ffi/Ss Distribution® where 


= 73 = "*3 ; 


ft = ^ = ^4, 
M2 


and 


_b]_ ^ 0i(02 + 3 )» _ al 

46o6s 4(4/3, - 3/3,) (2/8, - 3/3, - 6) 46(2 + 5)’ 


or to Elderton’s Frequency Curves and Correlation,^ 


Section IV. The Inverse Sampling, Our Problem 

It is now our problem to study the theory of inverse sampling, by which we 
mean that given the characteristics of a single sample drawn at random from a 
parent population, we wish to ascertain the probability that the corresponding 
characteristics of that parent population do not differ from those observed in 
the sample by more than a specified amount. To illustrate, suppose we are 
interested in knowing the average height of 1000 freshman students to which 
reference has already been made. Due to the fact that it takes too much time 
or is otherwise impractical to measure all of them so as to obtain the true average, 
we select at random one hundred of them and measure the heights of these one 
hundred individuals. Suppose the mean, the standard deviation, and the 
skewness of this sample of one hundred are computed and they are as follows: 

3/ = 67.99 

(7= 2.327 

aa = - . 12299 


Now assuming that the true mean of the entire 1000 heights is unknown, let 
us find the probability that the true mean of this parent population lies between 
M* = a and Mx = hhy what we know of the characteristics of the observed 
sample of one hundred as recorded above. It is clear that if we can obtain an 
equation, y = /(Af,), of the frequency curve associated with the distribution of 
hypothetical means of this parent population, we shall be able to ascertain the 
probability we desire by evaluating the following integral expression: 



* Pearson, K., Tables for Statisticians and BiometridanSy Vol. II, front page. 

• Elderton, W. P., Frequency Curves and Correlationy Table VI, opposite p. 46. 



70 


CHING-LAI SEEN 


In the same way we can find the probability that the standard deviation of 
the parent population lies between two definite limits or that the skewness of 
the parent population lies between two definite limits. 

Our procedure will therefore be as follows: First, assuming the a priori 
existence of a continuous sequence of h 3 rpothetical means of the parent popula¬ 
tion, we investigate the relation between the distribution of these hypothetical 
means of the parent population and the distribution of sample means. If such 
a relation exists, we shall be able to find an expression for the most probable 
value of the parent mean. Assuming the most probable value of the parent 
mean to be the true mean of the parent population, we shall obtain an expres¬ 
sion for the most probable value of the standard deviation of the parent popula¬ 
tion. Then it will be possible for us to express the frequency curve associated 
with the distribution of hypothetical means of the parent population in the form 
of /(Af *). Similarly we may find the frequency functions associated with the 
standard deviation and skewness of the parent population. 

Before leaving this section, it is perhaps not out of place to say a word about 
the connection of this theory of inverse sampling with Bayes^s Theorem. The 
theory of inverse sampling (which deals essentially with the problem of judging 
the nature of a whole by observation of a part of it) belongs to the domain of 
inductive probability, or inverse probability, upon which Bayes^s Theorem was 
founded. In order to solve a problem of inductive probability, it is necessary 
to postulate the a priori existence of the causes from which an event takes place, 
which, in our case, is the hypothetical means of the parent population. 

This a priori hypothesis which gives rise to Bayes^s Theorem has been viewed 
with suspicion by a number of mathematical statisticians. For example, the 
theorem has been called into question by such mathematicians as Bing, Venn, 
Chrystal, and others, including several now living. But so far as the present 
writer is aware, no definite conclusion has been reached. It is true that on the 
one hand Bayes's Theorem has not been rigidly demonstrated and proved by 
logic; but on the other hand the process of generalization from observational 
data is justified within the limits of ordinary practical application. One who 
holds Bayes's Theorem strongly may even say that the a prigri hypothesis is 
absolutely necessary to scientific inferences. Concerning this controversy, 
Pearson takes a liberal point of view: “I hold this theorem [Bayes's Theorem] 
not as rigidly demonstrated, but I think with Edgeworth that the hypothesis 
of the equal distribution of ignorance is within the limits of practical life justified 
by experience of statistical ratios, which a priori are unknown • • • He has 
further remarked that ^^the practical man • • • will accept the results of inverse 
probability of Bayes-Laplace brand till better are forthcoming."^^ Using 


Pearson, On the Influence of Past Experience on Future Expectation, Philosophical 
Magazine^ Vol. 13, Jan.-June, 1907, p. 366. 

“ Pearson, K., The Fundamental Problem of Practical Statistics, Biometrihay Vol. 13, 
1^20-21, p. 3. 



FUNDAMENTALS OF THEORT OF INVERSE SAMPLING 


71 


Pearson’s viewpoint, we shall proceed with our problem by postulating a priori 
the existence of hypothetical means of the parent population from which our 
sample is drawn. 


Part U. Fundamental Relation between the Moments of the Distribution of 
Sampling Means and the Moments of the Distribution of the Hypothet¬ 
ical Means Associated with the Parent Population 

The characteristics of the distribution of sample means, as we have pointed out 
in Part I, Section II, differ from those of the sample distribution only by a 
constant multiple of (1/r)* where k is the order of the moments concerned. We 
may write down the first six moments of the distribution of sample means: 


(4) 


M., = M, 


r 


2 ! 


Hz:.. = 


Ms: 


_ 41 S J p Mv.x , Ps SmI: 

M4:., /^4 — + 


3!. 

2 o -2 
1 

4! ' 2! (2!)2 


ay., = 5! ®jp, ^ + P,P2 ® 


= 6! ^Jp.^^ + P4P2 


5! 

tZe.x 

6! 


3! 2! 

« a*:x az:z 
4! 2! 


+ 


Plsal:x P\^al.: 

2! (3!)=' 3! (2!)» 


From these we immediately obtain 


( 5 ) 


M., = Mx 


,v — 2r / s — 1 


(s — 1) (s“ -|- s — firs 4- fi'"'*) 
r(s — r) (s — 2) (s — 3) 


{a4;x — 3} 


fis(r — 1) (s — r — 1) 
r(s — r) (s — 2) (s — 3)’ 


etc. 



72 


CHINQ-LAI SHEN 


If our parent population is infinite, which is a special case by allowing « —» w, 
then we have 


( 6 ) 


M,, = M, 

1 

'Vr 

1 

^3:1* — 7^ ^3:x 

Vr 


a4:*x — 3 = - (a4;x “ 3), etc. 
r 


Let US now define f(t) as a frequency function of the distribution of sample 
means Zx in standard units, i.e., 


(7) 


t = 


Zx-M.^ 




Denoting the observed mean of a given sample by mi and making proper sub¬ 
stitutions of (5), we obtain 


( 8 ) 


mi — Mx 

<^zx 


mi — Mx 

a/-±—L- 

y r(s - 1) 


It is clear that if we hold s, r and <r* constant and let Mx vary, then t is a 
function of Mx only and consequently/(O becomes a function of ikf 
Suppose now M^^\ M\^\ Mi^\ • • • be a continuous sequence of hypothetical 
means, which Af * has an equal chance to assume. These hypothetical means 
will certainly lie in a linear interval between their natural limits. Then the 
probability that Af xlies between Mx =t: i dMx isfifydMx^ Therefore, to obtain 
the probability that Mx lies in the interval M[^'^ ^ Mx ^ it is only 

necessary to carry out the integration of this expression: 



There is no question as regards the existence of this integral in case of an 
infinite parent population. As for a finite population, we may still use this 
continuous function as an interpolation function to the true discontinuous 
function. 

Let us now define’ P(t) as the probability function for which the hypothetical 
mean of the parent population falls within certain specified limits. Considering 



FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


73 


finip as the nth moment of this probability function about a fixed point, we will 
have the following relation: 


( 10 ) 


Mn: p 








where I and — I are their natural limits- 

Since from (8), ilf ^ = mi — then after substitution, we obtain 


( 11 ) 


( 12 ) 


mi+Z 






Ml;p p — mi 

I M«:p = + M2;*x 

I MSp = mj + 3mi/[l2:rx — M3:zx 
M4:p = mj -f- 6mi/Z2:*x — ”t" 7 ^tC. 


MS:fx 


The first relation Mp = mi is important because it shows that the mean of 
the hypothetical means of the parent population is equal to the mean of the 
observed sample drawn from it. To state this in a theorem, we will have 

Theorem L The expected value of a parent mean is equal to the mean of an 
observed sample chosen from the parent population. 

We now wish to express the moments of the probability function about its 
mean in terms of the moments of sample distribution. In general, the nth 
moment of any frequency distribution about its mean, finy can be expressed in 
terms of its moments about a fixed point after the following fashion: 

(13) /i„ = An - (i) + ( 2 ) MVn-2-+ (" (”) 



74 


CHING-LAI 8 HEN 


Therefore when we substitute (11) into (13) we obtain 

M«:p = wij - mp* Ml:.. + rn\~* MJ:., - • • • 

+ (-1)"“* m\ + (—1)"~* ” 2^ ml M»-2: 

+ (-1)"“ (n - l) + (-1^” (n) 

- mi j^wip' - 7 2 

+ (n I J) (n I 0 

+ mj ^ 2 ) ”»r‘ - (” 7 ”*** ■*■ C* 2 0 “ ■■■ 

■*■(“!)”’ (n - 3 ) * (n — 2 } 

- ml mr® - 7 ■•■ (” 2 “ ■ ■ ■ 

+ (- 1 )"-’ _ g^ Mn- 3 :..j 


+ 


+ (-l)-'mr‘(„P) [•».-( 

[;)«..] 


+ (-1)’ ».;(:) 



Adding vertically each column, we obtain 



—■•[©-(") ^(3)- (s) + © 

-+ (■ 

-■>•(:)] 

-”r‘....[©(o)-(’’T0© + (”T 

0©-i 


+'(”r‘)G)--+<-‘>- 

(.-:)] 





rUNDAHENTALS OF THEORY OF INVERSE SAMPLING 


75 



+ ( 




+ (-1)-M«:.x 

The first row of the above expression is equal to mi(l — 1)" = 0; the second 


row is equal to 

- [oft-! 


n! 


n! 


(n - 1)! 


1 


= —TWj 


«![ 

W:.x Ti 


l!(n - 1)! 1! (n - 1)!. 1! (n - 2)! 

+_ ^ __+ (-l)-'__I 

^ 2!(n - 2)! l!(n - 3)! Mn-l)!lU 

1 1 


l!L0!(n-l)l l!(n-2)! 

|1 




2!(n - 3)! 

-iCl + n_iC'2 — 




+ (""1)" n-l^n-l] 


= - 1)"'‘ = 0; 
the third row is equal to 

n-2- r (^ — 1)! 

L^! ’ 2! (n - 2)! “ l!(n - 1)! ' 2\{n - 3)! 

I - 2)! I ( i)~-i 1 

■^2! (n-2)! 2!(n-4)! ^ ^ (n-2)!2!j 

= [1 - + n-iC^ - . • • + (-!)»-• 

_ »v.n—2- /-I 1^'>-2 n. 

= w, M*:.* —2!- (1-1) = U, 

and similarly all the other rows turn out to be zero except the last one which is 
equal to 

(14) . 


fitiip — ( l)"/*n:ix 




76 


CHINQ-LAI SH¥N 


This may be rewritten as 
(16) 1'^'" ~ 

““ ^ MSn4*lt*x 

or in standard units 

OJ2n:p = 

^2n+l;p ^2n-f-l:*x 

The results^ of (15) are important and fundamental because they establish 
the relation between the Theory of Inverse Sampling and the Theory of Sam¬ 
pling. Therefore we may formulate the following theorems: 

Theorem //. The even moments of the distribution of the hypothetical means 
of a parent population about its mean are equal to the corresponding even mo¬ 
ments of the distribution of the sample means about the mean. 

Theorem III, The odd moments of the distribution of the hypothetical 
means of a parent population about its mean are equal to the negative of the 
corresponding odd moments of the distribution of the sample means about the 
mean. 

Since the even moments of the two distributions are the same, while the odd 
moments differ only in sign, it is evident that for symmetrical distributions, the 
two curves/(O and P{t) are exactly identical, because in a symmetrical distribu¬ 
tion all the odd moments about the mean are bound to vanish. In case of 
nonsymmetrical distributions, the curve P{i) is nothing but a vertical reflection 
of the curve/(O as shown in the figure: 



In other words, if fit), for instance, assumes Pearson^s Type III Function, then 
P(0 also assumes Pearson^s Type III Function except that their skewness is 
different in sign though equal numerically. We therefore state our theorem as 
follows: 

“ So far as the writer is aware, these theorems were first developed by Professor H. C. 
Carver. 




FUNDAMENTALS OF THEOBY OF INVERSE SAMPLING 


77 


Theorem IV. The curves for the distribution of the hypothetical means of the 
parent population and the curve for the distribution of the means of the sample 
obtained from the parent population are symmetrically situated and one is a 
vertical reflection of the other. 

Part m. Inverse Sampling Associated with a Normal Parent Population 

We shall be concerned in this part of our discussion with a normal parent 
population. In accordance with the characteristics of a normal parent popula¬ 
tion we wish to investigate the most probable values of its mean and variance, 
thereby obtaining the distributions of the hypothetical means and variances of 
the parent population. 

Section I. Most Probable Value of the Mean of the Parent Population 

In Part I, Section III, we have mentioned Pearsonian Types of Frequency 
Curves whose differential equation is 

I dy _ a — i 
"t Tt ■" bo + bit + 62 ^ ■ 

It is clear that the mode of these curves is at ^ = a, provided the mode exists. 
But to recapitulate: 


where 


a = 


—as 


2(1 + 26)’ 


5 = 


2^4 — Sal — 6 
a 4 + 3 


consequently for the mode of the distribution of sample means, we have 


(16) 

where 


^Sizx 

2(1 + 26 J ’ 


(17) 


= 


- ^ 0 ^ 8 :rx ~ ^ 


2(s — l)(s — 2)(82 -f 6 * — 6r5 + 6r2)(a4:* — 3) 

— 125(?' — !)(« — 2)(s — r — 1) — S(s — 1)(« — S)(s — 2r)^al.^ 
(s — 2) {(s — l)(s“ + 8 — 6rs + 6 r*)(a 4 :x — 3) 

— 6 s(r — l)(s — r — 1) + 6 r(s — r)(s — 2 )(s — 3)} 


( 18 ) 


t = ^ 

<^zx 


Zx — Mx 


1 /r(s - 1) 


s — 2r / s — 1 
s — 2 'I' r{a — t) 
2(1 + 25.J 


Oli-.x 



78 


CHING-LAI BHEN 


Now according to Theorem IV, the mode of the probability function P(t) is 
situated symmetrically with respect to the mode of the frequency function f(t) 
of the distribution of sample means; hence, for the mode of the probability func¬ 
tion of hypothetical means of the parent population, we have 


(19) 


mi — Mx 
/ s — r 
V r{s - 1) 


g — 2r / ^ 

8^2 Y r(8 — r) 


2(1 + 25, J 


where 5,, remains unchanged because it is a function of and a 4 :,;p, each being 
always positive. 

Solving for Af*, which will now be the most probable value of the mean of the 
parent population and hence denoted by we have 


( 20 ) 


Mx 


= mi — 


8 — 2r (TxCtZ.x 

r(s - 2) 2(1 + 25.J 


It is interesting to note that if s = 2r, this expression yields M* = mi, irrespec¬ 
tive of the law of distribution of the parent population provided only that 5,, 
is not exactly equal to — But since the Pearson^s function is used for gradua¬ 
tion, one should not fail to see that the mode so obtained gives only an approxi¬ 
mation to the true mode. Therefore we state a theorem as follows: 

Theorem V. If a sample is composed of one-half of the variates of the parent 
population from which the sample is chosen, then the best approximated ^most 
probable value' of the mean of the parent population is equal to the mean of the 
observed sample provided only that 5,^ is not exactly equal to —J. 

It is further observed that if a^-.x = 0 but 5,^ — i, then the expression (20) 

will likewise yield Mx = mi. But ag:* = 0 implies that the frequency curve of 
the parent population is S 3 unmetrical. Hence 

Theorem VI. For any symmetrical curves associated with the distribution of 
the parent population, the best approximated ^most probable value’ of the mean 
of the parent population is equal to the mean of the observed sample provided 
5,^ is not exactly equal to — J. 

But we will investigate further the most probable value of the mean of a 
normal parent population, and we know that in a normal distribution the 
moments bear the following relation:^* 


( 21 ) 


(2n)! 


[atn+l = 0 


** Carver, H. C., Frequency Curves, Handbook of Mathematical StatisticSy p. 97. 



FUNDAMENTALS OP THEORT OF INVERSE SAMPLING 


79 


i.e., as = 0 
a4 =c 3 

as = 0 

as = 15 

ay = 0 

as = 105 
etc. 

Consequently for a normal parent population the a,, function in (17) is 
immediately reduced to 

( 20 ) X = _28 (r - l)(g - r - 1) _ 

** a(r — l)(s — r — 1) — r(s — r)(s — 2)(s — 3) 

Let us, first of all, investigate the possibility that this expression will be 
exactly equal to — i for positive integral values of r and s. 

Suppose we set 

_ 2s(r - l)(.s - r -_!)_^ _ 1 

.s*(r — l)(s — r — 1) — r(s — r)(s — 2)(s — 3) 2 

and solve r in terms of 8. Thus we obtain 


( 00 ) r - ® ^ - 10.S- + 6) 

(2d) »• - 2 ± 2 (> - 

If s ^ 10, then the second term on the right side is positive. As it is absurd 
that r should be greater than s, therefore the positive sign of the double sign 

should not be taken. Then, as the second term is obviously greater than the 

right member will be negative. Since r cannot be negative, no positive integral 
values of r and s, for which s ^ r, can satisfy (23). For s < 10, there are only 
nine positive integers; and direct substitution of each will tell us that only when 
s = 1, 2, or 3, r is a positive integer which is either 1 or 2. As these are trifle 
cases because a parent population can never be so small, we may safely say that 
for a normal parent population 

(24) Mx = irii 

Theorem VII. For a normal parent population, the best approximated ^most 
probable value^ of the mean of the parent population is equal to the mean of the 
observed sample from it. 

For an infinite parent population, i.e., 5 —► qo (20) yields on reduction 


(25) 


Mx = mi — 


1 (Tx OiZix 

r 2(1 + 2dJ 


^ 2(«4=x - 3) - 3or|:x 
( 04 :* — 3) — 6r 


where 



80 


CHING-LAI SHIDN 


Formula (25) yields immediately ift', = m\ if as;* = 0 and 6,* 5^ — i* For a 
normal parent population 5,* = 0. Hence Theorem VI and Theorem VII both 
hold for the infinite case. 

Section II. Most Probable Value of the Standard Deviation of the 

Parent Population 

To find the most probable value of the standard deviation of the parent popu¬ 
lation, we shall assume the mean of the parent population to be the best ap¬ 
proximated ^most probable value^ of the mean, which we have obtained in the 
preceding section. This assumption is necessary since we do not know the true 
mean of the parent population. 

Now, to start with, we shall consider «Cr possible samples, each consisting of 
r variables. The second moment of each sample computed about the best 
approximated ^most probable value’ of the mean of the parent population may 
be written as 

~ + (^2 - + (a*3 - 7ni)2 + . .. + (x^ - miY} 

r 


Zi = - {{Xi - miY + (ars — miY + (x4 — miY + 
r 


+ {Xr+i - miY\ 


= - {(a:.-r+i — miY + (x,_r+2 - miY 


+ (x._rf3 - miY + . • . + (x, - miY] 


If we write {xi — m{)^ = i/,-, it is clear that the above may be considered as a 
distribution of sample means dravm from a parent population 2/i, 2 / 2 , Vz • • • 2/«; 
and consequently 


(26) 


{M.^ = My 


_ . / ^ ^ 

- <^1/ y r(s _ 1) 

_ S — 2r / 8—1 
" 7^2 r r(8 - r) 


— 3 — 


(s — 1) -f- « — 6r8 + 6r^) 

r(8 — r) — 2) (s — 3) 


{ 04 : 1 , — 3} 


68(r — 1) (8 — r — 1) 
r(8 — r) (8 — 2) (8 — 3)' 




FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


81 


Now the nth moment of y about a fixed point may be written as 

= ^ E {(^ - + (M, - m,)p» 

(27) = fiin-.x "I" ^ 2 ^ {Mz — Wi) 

+ ( 2 ”) + ( 3 ”) 

+ M2n-4 i (M* - rriiY + ... + {Mz - rriiY''. 

On the assumption that our parent population is normally distributed and 
due to the fact that in a normally distributed function 

ot 2 n = and a 2 n+i = 0 [See (21)], 

JL 71 1 


the expression (27) immediately takes this form: 
2n! 




( 28 ) 


i! j„ , (2n\ (2n - 2)! j„_, 

Xr- +(2) 2-(«- 1)1 ^ • 


+ ( 4 ") - I '! <"■ - ("■ - 


Imposing the condition mentioned at the beginning of this section (i.e., Af, 
assumes its best approximated ‘most probable value’ wii), then all the terms drop 
out except the first one. Hence, as a final form, we have 


(29) 


Mn;i, = 


2n! 2 „ 

2- • n\ ’ 


Ml:» = My = ff* 
fitly = ^z 

Ht-y — 15 <tI 

Hi:y = 105(r* 
etc. 


( 30 ) 



82 


CHING-LAI SHEN 


It follows that the ifcth moment of y about its mean will be 

M*=, = - (J) Hk-l:y My + 

(0 m; 

_ 2k\ ,, {k\ (2k-2)\ (2fc-4)I , 

2 *Tir!\1 / 2*->(fc - 1)! *'* \2y2*-»(A - 2)! 

-+(_!)* 

J* r 2kl /k\ (2k - 2 )! /fc\ (2k - 4) 

(31) “ [2* • fc! \1/ 2*-« (k - 1)! ■•■ V2/ 2*-* (ifc - 2)! 

-+ (- 1 )*] 

_ 21 2fc! r fc A;(fc - 1) 

2 *-ifc!L 1 !(2 A; - 1) 2 !(2 Jfc - l)(2 jfc - 3) 

fc(/: — 1) (A: — 2) 

“ 3!(2A: - 1)(2A; - 3) (2A: - 5) 


+ ••• + (-1)*^ 


k\ 

k\(2k- 1) (2k - 3) (2fc - 5) • • • (3) • (1) 


(32) 


Ml;» = 0 
MJ:v = 2<r* 

MS:y = So-* 

< /Z4:» = 60<r* 

M»:„ = 6440^ 

= 6040 oi* 
etc. 


And therefore we obtain 


0(3: p = 2\/2 
0(4: y = 15 

^ a6:v = 58 V2 


( 33 ) 


afl:v = 715 
etc. 



FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


83 


Making proper substitution of (30), (32), (33) into (26), we obtain 
r 2 

1 


(34) 


= . / 2(8 - r) a 
r r(s- 1) 




a4;»y — 3 = 


2 (s - 2r) 


y r{8 - 


i) 

0 


12 (s — 1) (s^ + s — Grs + 6r2) — 6« (r — 1) (s — r — 1) 


r (« — r) (s — 2) (s — 3) 
For an infinite parent population, i.e., s oo, we have 

f M., = <r* 


(35) 


.. - <'5 


«8; 


— 3 = 


12 


Now again with reference to Pearsonian Types of Curves for which the mode 
is at ( = a, we have for the mode of the distribution of sample means 2 y, 


(36) 



^8: 

2(r+^ 


where 


a4:*y + 3 

__ 3) [4(^ 2r)Hs ~ 1) + 2r(8 - 2)^8 - r)] 

(s — 2)[2(s — 1)(82 + s — 6rs + fir^) 

+ r(s — r)(s — 2){s — 3) — s(r — l)(s — r — 1) 


Substituting (34) into (36), we obtain 

R) gy - ^ __ ^ - 2r ^ / 2(g ~ 1) ^ 1 

/2(s - r) 2 s - 2 y r(s - r) ■ 1 + 25,^ 

y r(s - 1) 


By Theorem IV, the best approximated ‘most probable value’ of the standard 
deviation of the parent population is obtained from (38) by changing the sign 
of the right member and replacing 2 y by m*. Thus we have 


2 

— a j, 



2(s - r) 
r(8 — 1) 


8 - 2r / 2(8 - 1) _ 1 

8 — 2 y r-(8 — r) 1 + 25,y 



84 


CHING-LAI 89EN 


Solving for or,, which is now the best approximated ^mogt probable value^ and 
should therefore be denoted by a**, we then have 


(39) 


A 2 * 

0-x = M2 :x == 


mi 

r~ 2(a-2r) 

" r(8 - 2)(1 + 26,,) 


The best approximate ^most probable value' of the standard deviation may 
therefore be written down as 


(^x = 


i/i +__ 

r " + r(s - 2)(1 + 2«„) 


where o-, = \/mi 


This formula is, of course, subject to a systematic error that arises from the fact 
that we employ the square root of the best estimated ‘most probable value' of 
the variance, Itmay be shown, however, that when r is large, the error is small. 

Consequently, we have the following theorem: 

Theorem VIII, For a normal parent population, the best approximated 
‘most probable value' of the standard deviation of the parent population is 
equal to 

__ 

7“ 2(8 - 2r) 

r{s — 2)(1 + 26,,) 

where <r, is the standard deviation of an observed sample from the parent popu¬ 
lation and (TMy is a function of r and s as expressed in (37). 

It is interesting to note from (39) that when 8 = 2r, = a, provided 5,, 5*^ — 

However, from (37), 6,, cannot be equal to — J in the case of 8 = 2r, where 8 
and r are both positive integers. Consequently, we may state this fact in 
another theorem: 

Theorem IX. If a sample is composed of exactly half of the variates of a 
normal parent population, then the best approximated ‘most probable value’ 
of the standard deviation of that parent population is equal to the standard 
deviation of an observed sample from it. 

For an infinite parent population, (39) yields on reduction 



(40) 



for (Tty = 0 when 8 —> qo . 


Professor H. C. Carver has worked out a relation between the most probable value of 
a:* and that of x by assuming that the latter is distributed according to a Type III distribu¬ 
tion. With his permission, I state the result as follows: 

M. P. V. xr> * (M. P. V. xy 



where X 




and Mx 


the distance of the mean from the origin. 



FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


85 


Theorem X. For an infinite normal parent population, the best approximated 
^most probable value^ of the standard deviation of the parent population is 

equal to the standard deviation of an observed sample multiplied by 

Section III. Distribution of the Hypothetical Means of the Parent 

Population 

In the preceding two sections, we have obtained the best approximated ^most 
probable value’ of the mean and the best approximated ^most probable value’ 
of the standard deviation of a parent population assumed to be normal. We 
are now in the position to characterize the distribution of these hypothetical 
means by assuming that the best approximated ^most probable value’ of the 
mean of the parent population b(‘ its mean and the best approximated ^most 
probable value’ of the standard deviation of the parent population be its stand¬ 
ard deviation. Such a characterization is subject to its o^\^l probable error. 

Due to the fact that our parent {)opulation is normal by assumption, formulae 
(4), which we are to use this time, have to be modified by the proper substitution 
of the recursion relation of the moments of a normal distribution [See (21)]. 
After such modifications, they assume the following forms: 

'M,, = Ms 

^ p - 

M2;zx = ^2 M2:x 

= 0 

(41) ^ /p I 

= 0 

i^:*s ~ (^6 ”1" 3 P 4 -P 2 S + ^2S^)m2:x 

In accordance with Theorems II and III, we therefore have for the distribution 
of the means of the parent population the following: 

( Mm^ = mi 



( 42 ) 




iif.Mx = — Ms;»x = 0 

M6:A#x = —Mftrrx = 0 

= M.:.x = ^ + 3P.P2S + PWr^l, 

= -^ (A + 3^4^28 + 



86 


CHING-LAI SHEN 


Consequently 


( Mm^ = m 



OiS-Mx = 0 


6s(r — l)(s — r — 1) 
r{s — r)(s — 2)(s — 3) 


For an infinite parent population, i.e., s —► oo, we have 
Mmx = 


(44) 


Cm, 


y/ r y/r \/'i 


■ c, — -- 

r + 2 Vr + 2 


c» 


y [from (40)] 


Oii:Mx = 0 

ai:Mx — 3 = 0 


Now if we can find the equation of the curve associated with the distribution 
of the means of the parent population, we shall be able to ascertain the prob¬ 
ability that a mean lies within certain limits after a sample from the parent 
population has once been observed. 

Let us illustrate this by again referring to the same problem of the heights of 
1000 freshman students as recorded in Table 1. Considering this as our parent 
population which is almost normal with 8 = 1000, we take every tenth indi¬ 
vidual height from the original list in which the 1000 heights are tabulated. 
Thus we obtain a sample with r = 100. The frequency distribution of these 100 
individual heights is shown in Table IV. 


TABLE IV 


Sample of 100 Heights Selected from the Parent Populaiion of 1000 from Table I 


Class 

Frequency 

62.5-64.4 

9 

64.5-66.4 

16 

66.5-68.4 

31 

68.5^70.4 

29 

70.5-72.4 

13 

72.5-74.4 

2 



FUNDAMENTALS OF THEOBT OF INVERSE SAMPLING 


87 


We compute the mean, the standard deviation, the skewness, and the fourth 
moment about the mean of this sample: 

mi = 67.99 

m* = 5.415,2 a. = 2.327,058 

m, = -1.549,872 as,. = -1.229,91 

m 4 = 71.615,158 04 ,.= 2.442,17 

From Theorem VII, 

= 67.99 

From (37) and (39), we obtain 

= - .099,833 
U.z = 5.328,067 
Substituting into (42), we have 


Mm, = 

67.99 



.048,000,603,6 

= .219,09 

M3:3/x = 

0 

= 0 

= 

.006,898,429 

a 4 :Jif. = 2.994,03 


0 

CltUx = 0 

ih.Mg = 

.001,649,027 

= 14.910,37 


The coefficients of Charlier’s Type A Function (3) are as follows; 


.000,250 

4! 



= .000,000,1 

From the values we are justified in assuming that Af* is normally distributed. 
We may now ask ourselves concerning the probability that the mean of the 
parent population, Af *, from which this sample is selected, exceeds 68.5 inches. 

_ 68.5 ~ 67.99 _ ^ 
ctm, .21909 

P ^ \ dt = .009962 

y 2.8278 



88 


CHmchuu sH]»r 


Let ua now come back to investigation of the g^eral case for the distribution 
of the hypothetical means of the parent population. Because there is no definite 
relation l^tween the values of r and s, except r g «, and because, by assump¬ 
tion, our parent population is normal, is a function of r and s (22); that is 

. _ _ 2«(r - 1)(^ -r -^1) 

’* s(r — l)(s — r — 1) — r(« — r)(8 — 2)(8 — 3) 

Consequently, it is necessary for us to investigate for different values of 6^, with 
respect to various combinations of r and s before we can tell which Type of 
Pearson’s Curves will best fit the distribution of the means of the parent popula¬ 
tion. Hence, Table V; 


TABLE V 


Relation of the Values of with Various Combinations of r and s 


1 

fr ^ 100, 


-.0020 

8 = lOr \ 

r ^ 50, 


-.0040 


[r ^ 10, 

K ^ 

-.0189 

1 

\r 1 100, 

5.x ^ 

-.0040 

8 = 5r 

r ^ 50, 

5.x ^ 

-.0080 

1 

[r ^ 10, 

5.x ^ 

-.0397 

1 

T ^ 100, 

5.x ^ 

-.0101 

8 = 2r 

r S 50, 

5., ^ 

-.0204 

1 

[r ^ 10, 

5.x ^ 

-.1118 


8 = r + 1, r = any finite value, 5,, = 0 

8 = any finite value, r = 1 6,^^ = 0 

8 —> oo, r = any finite value, 6,, = 0. 

From the above table we observe; 

1) For an infinite normal parent population, the frequency distribution of the 
hypothetical means of the parent population is normal, because both and 
5,, are equal to 0 (See Part I, Section III). 

2) For any finite, normal parent population, if r = 1, the frequency distribu¬ 
tion of the hypothetical means of the parent population is normal. 

3) For any finite, normal parent population, if a sample r = 8 — 1 is chosen, 
the frequency distribution of the hypothetical means of the parent population 
is normal. 

4) For any finite, normal parent population, if 8 is equal to 5r or more and at 
the same time r is atjeast equal to fifty, the normal curve is a fair approximation 
for the distribution of the h 3 T)Othetical means of the parent population. 



rUKDAMBNTALB 07 THBOBT 07 INVERSE SAMPLING 


89 


5) For the other cases in which | j,, | is not negligibly i^all, we ought to make 
further investigation. 

Now, to carry out further investigation for the cases where | 5t, 1 is not very 
small, we need only look back to formulae (43), from which we observe that: 

— 3 < 0 for « 5 ^ r +■ 1, r 5 ^ 1 , or « does not approach infinity. 

Because of the fact that at-M, = 0 and < 3 is the criterion for Type II,‘‘ 
we conclude that Type II will be the best fitting curve for the cases mentioned 
in 5) above. To obtain this Type II curve we proceed as follows: 

Let the equation of the curve associated with the distribution of the hjrpo- 
thetical means of the parent population with which we are concerned be 
y = Then 


where 


1 dy _ a — t _ a — I 

ydi~ bo + bit + biP ~ -bi{t + R) {R - t) 


R = 


— bi zh ^b\ — 4?)oh» 


By proper substitution with the formulae in Part I, Section III, we obatin 


(45) 


R = 


^ ~ ^»») 
2S.X 



since aj:*, = 0 from (44) 


For the same reason a = . = 0; therefore the differential equation 

2(1 + 2S.x) 

may be rewritten as 


ydt bt{R:‘ - t‘y 


from which we obtain 


(46) 


y - yo (R^ - <**)« where 9 = _ ^ 


2hi 


6. 


Imposing the condition that the total area under the curv'e be equal to unity, 
we set 


1 = = yo t’)«d< 


EldertoD, W. P., op. cit.y Table VI, opposite p. 46. 



90 


CHING-IiAI BHEN 


Substituting t 


hence 


— R + 2Bn, we have 

= j'o(2R)^«+' fi(q + 1, g + 1) 

_ 1 r( 2 g + 2 ) 

•• 2 -“ (2ii;)*«+‘■ r(g + 1) r(g + 1) 


(47) 


y = 


r(2g + 2) 


( 2 i?)>«+‘r(g + 1 ) T(q + 1 ) 
1 


(R^ - < 2 )» 


r( 2 g + 2) / _ y 

+ 1) r(g + 1 ) \ 2g + 3/ 


2*«+‘ V2g + 3 r(g + 1) r(q + 1) 
where q may be expressed in terms of r and a by means of (46) and (22). Thus 


f^a\ _ 1 ^ _ r (s — r) (fi - 2) (s - 3) — 5« (r — 1) (« — r - 1) 

^ ^ ® 5 .^ 2s (r - 1) (s - r - 1) 

To sum up: In describing the distribution of the hypothetical means of a 
parent population from which our sample is chosen, we have the following 
theorems: 

Theorem XL The frequency distribution of the hypothetical means of an 
infinite, normal parent population is normal. 

Theorem XII, The frequency distribution of the hypothetical means of a 
finite, normal parent population is normal if r = 5 — 1. 

Theorem XIIL The frequency distribution of the hypothetical means of a 
finite, normal parent population is very nearly normal if s is equal to 5r or more 
and r is at least equal to fifty. 

Theorem XIV, The frequency distribution of the hypothetical means of a 
finite, normal parent population is according to Type II for the cases in which 
1 5,^ I is not negligibly small. 


Section IV. Probable Error of the Mean 

To measure the fluctuation of a sample mean from the true mean of the parent 
population, it is customary to use the term ^‘probable error^' to denote the 
expression: 

(49) , Em = 0.6745 

Vr 

where is the standard deviation of the parent population. As the true value 
of <rx is not known, rit is the common practice to substitute for it the value 

<rx, where is the square root of the expected value of the sample 

second moment. 




FUNDAMENTALS OF THBORT OF INVERSE SAMPLING 


91 


Therefore (49) is rewritten as 

(50) Eu = 0.6745 

Vr — 1 

Still, it should be noted, this expression is an approximation. Now from our 
theory of inverse sampling, as far as a normal parent population is assumed, 
we have obtained for the probable error of the mean 

(51) Eu = 0.6745 

vr + 2 

where o-, is definitely the standard deviation of an observ^ed sample. Although 
for large r, (50) and (51) do not differ much, yet (51) is obtained directly in 
terms of the standard deviation of an observed sample. 

To illustrate, consider the same sample of the heights of 100 freshman students 
(See Table IV) as obtained from an infinite parent population. Since the mean 
is 67.99 and the standard deviation is 2.327058, the probable error of the 
mean is 


Em = 0.6745 X = .1554152; 

that is, Mx = 67.99 ±. .1554152, which shows that the chances are even that the 
tnie mean of the parent population lies within the range 67.834,584,8 and 
68.145,415,2. 

Section V. Distribution of the Hypothetical Variances of the Parent 

Population 

Recalling the fact we have stated in Part III, Section. II, that the considera¬ 
tion of the distribution of the second moments of samples about the most 
probable value of the mean is equivalent to the consideration of a distribution 
of sample means drawn from a parent population 2/2, t/ii 2/s, • • • 2/m where y, = 
{xi — ntiY since in a normal parent population = ^1 [See (24)] wt can write 
dow n in perfect analogy with (12) and (14) 


(52) 

Now 


Mn:p = (— 1 )” Mtirzy 

Mp = rrh 


Mn:p = = Mn; -- 




since we have assumed the mean of the parent population to be its most probable 
value, i.e., mi. Hence by virtue of (52) and (34), the frequency distribution of 



92 


CHINCKLAI 8HEN 


the hypothetical variances of the parent population, which is assumed to be 
normal, is characterised by 


(S3) 


Af^:, = jra* 

_ = „ = j / 2(8 - r) , 

’* y r(8 - 1) * 




2(« - r) 
r(8 - 1) 


A 


3 

X 


since we assume the most probable value of the variance of the parent population 
to be its variance. 




2(8 ^ 2r) / 2(s ~ 1) 

s — 2 ^ r(8 — r) 




12 (« — 1) — 6s(r — 1) (s — r — 1) 

r(8 — r) (s — 2) (a — 3) 


For an infinite parent population, i.e., 8-^ », we have 


(54) 


MfLr.x = ^2 


0-u..* = 









r 


Now if we can find the equation of the curve associated with the distribution 
of the hypothetical variances of the parent population, we shall be able to 
ascertain the probability that a variance lies between certain specified limits 
after a sample is drawn from the parent population. 

For illustration, we will use the same sample of the heights of 100 freshman 
students (See Table IV) as selected from a parent population of 1000. 


We have s = 1000, r = 100 
mi = 67.99 

m 2 = 5.4152, orcr, = 2.327058 
From (37) we compute 

=, - ,0098 

> 

As 1 5,„ 1 is negligibly small, we may be justified in considering jS*;* to be 
distributed according to Type III (Part I, Section III). 

It follows from (39) that 

#5i:» = 5.32 



FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


93 


We compute the moments of the distribution of the hypothetical variances in 
accordance with (53). Thus 

« 6.4152 
= . 556 
= .239,946 

^ 3.055,75 

If we now wish to ascertain the probability that the variance of the parent 
population lies between iit:x = a = 6.5 and /ij:x = 5 = 6.5, we first convert a, b 
into standard units such that ta = .1525 and tb — 1.9511 and then evaluate the 
following integral:^® 



But this step is now^ not necessary since we have access to Tables of Pearson^s 
Type III Function.Hence we find from this table our desired probability. 

P = .39146 

In the above numerical example, we are justified in using Type III because 
I I is negligibly small. But for the general case, however, we ought to make 
further investigation concerning the values of 5*^. 


TABLE VI 

Relation of Values of with Various Combinations of r and s 


(r ^100 

6., ^ - 

.0098 

s = lOr \ r ^ 50 

K ^ - 

.0194 

[r ^ 10 

K ^ - 

.0859 


r ^100 
s = 5r j r ^ 50 
\r ^ 10 


8xy ^ - 0200 

^ - .0400 
g - .1983^ 


jr ^ 100 
s = 2r ’jr ^ 50 
[r ^ 10 


5 —► 00 , r = any finite value, 5,^ = 0. 


^ - .0518 
dzy ^ - .1073 
dzy ^ - .7642 


^•Elderton, P. E., op, cit.y p. 90. 

Salvosa, L, R., Tables of PearsonType III Functions, Annals of Mathemalical 
Statistics Vol. I, No. II. 



94 


CHING-LAI SHEN 


Reckling that is a function of r and a such that 


(a — 3){4(« — 2r)2(s — 1 ) -f 2r(s — 2)®(s — r)} 

(a — 2 ){ 2 (s — 1 )(«* + 8 — &ra + -f r(a — r)(s — 2 ) 
(s- 3 )- 8 (r-l)( 8 -r-l)} 


we construct Table VI of 5,^ for different combinations of a and r. 

From Table VI we observe the following facts. 

1 ) For an infinite, normal parent population, the distribution of the hypo¬ 
thetical variances of the parent population is according to Type III. 

2) For a finite, normal parent population, if a is at least equal to 5r and r 
at least fifty, the distribution of the h 3 rpothetical variances of the parent popula¬ 
tion is very nearly according to Type III. 

3) For the other cases in which 8xy is not small but negative in sign, the 
distribution of the h 3 rpothetical variances of the parent population needs further 
investigation. 

2 

From Part I, Section III, k = —r; and since we know that 8 is always 

greater than — 2 , therefore whether k is positive or negative depends .upon 
whether 5 is positive or negative. 

Now from Table VI we observe that 8gy seems to be always negative; hence k 
is negative. In accordance with the criterion for fitting curves, the frequency 
distribution of the variances of a normal parent population in such cases is 
according to Type I, which takes the form:^® 


(55) 

where 


_ ^ (mi-4-m»-f2) 




it - R,r'(R, - tr 


nil = 


a — R 2 
62(1^2 — Ri) ^ 


m 2 = 


a — Ri 

^liRi — R^ 


i 2 i, R 2 are the positive and negative roots, re 8 i)ectively, of the equation 60 + 
+ ^ 2 ^^ = 0 and can be expressed in terms of the first four moments: 

I? D dtz \/c4 — 45(2 -f 5) 

til, 112 =- ^ - 


We may sum up the foregoing in the following theorems: 

Theorem XV. The frequency distribution of the hypothetical variances of an 
infinite, normal parent population is according to Type III. 

Theorem XVI. The frequency distribution of the hypothetical variances of a 
finite, normal parent population approximates to Type III Curve if r and a are 
of such combinations that | 5,,, | turns out to be negligibly small. 

Theorem XVII. The frequency distribution of the hjpothetical variances of 


Elderton, W. P., op. ci7., p. 54. 



FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


96 


a finite, normal parent population is according to Type I in case that is not 
very nearly equal to zero and is negative. 

Part IV. Inverse Sampling Associated with a Parent Population Distributed 
According to Pearson’s T3rpe III Function 

Instead of a normal parent population as we have assumed throughout our 
discussion in Part III, we shall assume in this part a parent population which is 
distributed according to Type III. Therefore, besides the distribution of the 
hypothetical means and that of the hypothetical variances of the parent popula¬ 
tion, the distribution of the hypothetical third moments will also be considered. 
We shall carry out our discussion in practically the same way as we have done in 
Part III. 

Section I. Most Probable Value of the Mean of the Parent Population 

We have already obtained a general expression for the most probable value of 
the mean of the parent population: 

— 2r o’xOLiix 

' * ~ - r(s - 2 ) 2(1 + 2 S,,) 

where as before 

, __ 2a4:,, - - 6 

Ozx - —- 

«4:fx 4" 3 

But we are now concerned with a parent population which is distributed accord¬ 
ing to Type III. 

Since the recursion relation of the moments of Type III distribution is of the 
form 

(56) an+l = n ^«n_l + ^ 

^2 

Ui = 3(1 -f t) where 7 = -^ 

as = 2a8(5 + 87 ) 

ae = 5(3 4 “ 137 -f- 67 ^) 

a? = 3a8(35 + 777 -[- 307^) 

as = 7(15 + 1707 + 2617^ + 9 O 7 *) 

as = 4a8(315 + 16527 + 20077^ + 6307®) 

aio = 9(105 -h 24507 + 84357^ + 86687 ® + 25207 ^) 

= 5a8(3456 4- 352667 4" 919717* 4- 829627® 4- 22680y) 
ai, = 11(945 4- 393757 4- 2522457* 4- 5377777® + 4374907^ 4- 1134007®) 
etc. 



96 


CHINO*LAI BHBN 


it follows from (5) that for a Tsrpe III distribution of the parent population 


(67) 


M., - M, 

/ s — r 

r 

« — 2r / 8 — 1 

“*=*•“732 r r"(rr;)“*=* 

o _ (« - l)(g^ + 8 — 6rs + 6r^) a\,, 
■** r(8 — r)(« — 2)(8 — 3) 2 


6«(r — 1)(8 — r — 1) 
r(s — r)(« — 2)(s — 3) 


Therefore for the most probable value of the mean of the parent population, 
we have the same form as (20): 


M, = wii — 


- 2r 




r{8 - 2) 2(1 + 2«.,) ’ 


except now instead of (17) 


(58) 


= 2 — 


+ 12 


= 2 - 


<U:ix + 3 

(8-3){2(8 


l)(8-2r)»a,^+8r(8-2)*(s-r)) 


(s - 2){(8 - 1)(8* + 8 - 6r8 + 6r*)o*., 

+ 4r(8 — 2)(s — 3)(8 — r) — 4s(r — 1)(8 — r — 1)) 


We observe that if aj;, = 0, this comes back to the case of normal parent 
population which we have already treated in Part III. 

But if 8 —♦ 00 while as;* is finite, then 5,, = 0. Therefore, for the limiting 
case, i.e., when the parent population is infinite, we have 


(69) 


= 


mi 


2r 


<*3:* 


Since <r» and as;* are not known, we impose the condition that they assume 
their best approximated ‘most probable values’ respectively. Hence, we rewrite 
(20), (59) in the following forms: 


(60) 

where now 


Mx 


s — 2r Cx ai;x 

r(8 - 2) 2(1 + 2SJ 


(60b) 


(s - 3) (2(8 — 1) (8 - 2r)^ a|;, + 8r(a - 2)* (s-r)} 
’ (8-2){(8- l)(8* + 8-6r8 + 6r*)A|:, 

+ 4r(8 — 2) (8 —3) (8 — r) — 48(r — 1) (s — r — 1) 1 



FUNDiiMENTALS OF THEOBT OF INVERSE SAMPLING 


97 


and for the infinite case 

( 61 ) M* ^ mi ^ ix Olz:x 

2r 

So we state our theorem: 

Theorem XVIIL For a parent population which is distributed according to 
Tjrpe III, the best approximated ‘most probable value’ of the mean is the mean 
of an observed sample from it minus a correction factor which is a function of 
Vy Sy (T X y and otzix* 

It is also interesting to note that when s = 2r and 6,^ 5^ — then = mi. 

Section II. Most Probable Value of the Standard Deviation of the 

Parent Population 

We consider, as we have done in Part III, Section II, ,Cr possible samples, 
each consisting of r variates chosen from a parent population 8. .The second 
moment of each sample computed about the most probable value of the mean of 
the parent population may be written as 

2, = i {(xx - + (x, - + ... + (x, - 

T 

2, = i {(x, - + (x, ... + (Xr+. - Mx)*} 

r 


2^,^ = ^ {(a-.-r+i - + (x._r+s — MxY + • • • + (x. — 

If we write (x, — M*)® = y,, the above nmy be considered as a distribution of 
sample means drawn from a parent population yi, y^, yt, • ■ • y,. Therefore, 
as (27), 

yn:, = /i*„:x + - M,) + W„-,:x(Mx - 

+ ^ Wn-,:x(3fx - Mx)' + • • • + (Af. - 

When we impose the condition that the most probable value of the mean of 
the parent population be its mean, then the above yields 


Mn;v ^ 


Ht;y = Afy = /!»:x 




98 


CHINCH-LAI BHBN 


Consequently 


Z (1/ - MyY 

MA:y = — - 




+ (-l)^Mj 




M2fc—4-x M2:* 


-+(-!)* m5:x. 

Now from the fact that we assume a Type III distribution for our parent 
population, therefore we have 

ik-.y = M4:x - iil-.x = (3^ + 2)cl 

i^i-.y — ~ ik-.x 4* = (307* + 567 + 8)(rJ 

(62) i fitly = M8;x — 4ii6ix fiiiz + 6m4:xMs;x “ 3jU^x 

= (6307* + 17077* + 9487 + 60)<t; 


Substituting (62) into (26), we have 
'M., = al 

<^‘y = v-x /|/ jr(7iri) ^^7 + 2) 

_ s — 2 r / .s — 1 3 O 7 * + 567 + 8 

j3) l«3:x„ - y— 2 y ^(737) • ■■■ (37+^2“■ 

_ (6* — 1) + 5 — 6ri{ + 6r^) GSOy^ + IGSOy^ + 9127 + 48 

- - r(s - r) (s - 2) (s - 3) (37 + 2)* 

6s(r — 1) (s — r — 1) 
r(s - r) ( s - 27(8 - 3)‘ 

For an infinite parent population, the above yields by allowing s —» » 






37+2 


_ 1 307^ + + 8 

(37 + 2)>« 

_ 1 6307* + I68O7* + 9127 + 48 
r (37 + 2)* 



FUNDAMENTALS OF THEOBT OF INVEB8E SAMPLING 


99 


In accordance with (38), we write 


(65) 


iy - <fl 


8 — 2r / s — I SO 72 + 567 + 8 
s — 2 y t{ 8 — r) (87 + 

2(1 + 26,,) 


It follows from Theorem IV that for the mode of the standard deviation of the 
parent population, we have 


'Lix- j&.y 

( 66 ) - J. .^ 

+ 2 ) 

where 


s- 2r / 8 - 1 30y2 + 56^ + 8 

s ~2 V rjs-r) (87 + 2 )> 

2(1 + 25.,) 


(67) 


5., = 




3al., 


- 6 


+ 3 


which is a function of r, 5 , and as;*- 

Assuming the best approximated ‘most probable value^ of az:z for az:z and 
remembering that 


Mz = mi — 


6 * — 2 r ^z&Z:x 
r(s - 2 ) 2(1 + 2 Sj ^ 


we write ( 66 ) in the form of 


a2 2 

'W* + 9 


(1 + 25. 


30^* + 5frp + 8 

- = 9 — - -j ' : O'- 


( 3 ^? + 2)(1 + 2di^) 


( 68 ) 


where 


— 

^ x 


nh 


1 + 9 


2gH 


307^-t-56-p + 8 
(3-? + 2)(1 + 25.,) (1 + 25.JS 


9 = 


s — 2 r 


2 r(s - 2 ) ’ 

= (60.6) where as.* is replaced by os;* 
5 , = (67) where as,, is replaced by as.* 


y = 


A* 

««:* 



100 


CHmO-LAI SHUN 


We rewrite ( 68 ) in the abridged form: 


r, s) 


<l>(at:x, r, s) 


where 


. 30 ^* + 56^ + S 2g^ 

•■.‘>-V^+«(3f + 2)(1 + 2t,) " OT^- 

and state our theorem: 

Theorem XIX. The best approximated ‘most probable value^ of the standard 
deviation of a parent population which is assumed to be distributed according to 
Type III is equal to the standard deviation of an observed sample of it, multi- 
1 


plied by 


r, s) ‘ 


For an infinite parent population, gi = ^, 6 *, = 0 and 

2 r 

^ ^ 2(630?» + 1680^^ + 912-f + 48)(3-^ + 2) - 3(3»?» + 56-f + 8 )^ 

(3-f + 2)[(630^» + 1680f2 + 912^ + 48) - 6r(3-f + 2)^1 

Theorem XX. The best approximated ’most probable value’ of the standard 
deviation of an infinite parent population which is assumed to be distributed 
according to Type III is equal to the standard deviation of an observed sample 

of it, multiplied by--. 

lim 0(a8.x, r, a) 

« —»00 

Section III. Most Probable Value op the Skewness of the Parent 

Population 

Let us again consider «Cr samples, each consisting of r variates chosen from 
a parent population s. The third moments of each sample computed about the 
most probable value of the mean of the parent population may be written as 

Z. = 1 {(xi - SixY + (x, - J(/x)» + • • • + ixr - ^xY] 

(70) Zi = -{(Xi- MxY + (X| - ^xY + • • • + (xr+l - J^xY\ 


= r - ^^Y + (x._r 4 « - J^xY + ■■■ + (x. - ift’,)’ 




FUNDAMENTALS OF THEOBT OF INVERSE SAMPLING 


101 


If we write {Xi — » w,-, the above may be considered as a distribution 

of sample means drawn from a parent population w\, m, wj, • • • w,. Conse¬ 
quently in accordance with (6), we have 


(71) 


M 




Mu 


1 ) 


l/ r(s- 
_s — 2r^/s — 1 

\/ / \ ^Siw 

8—2 y r(s — r) 


Q _ (s — 1)(«^ + « — 6rs + 6^’^) ( 

r(s - r)(8 - 2)(8 - 3) 


3} 


6s(r — l)(s — r — 1) 
~r(8 _ l)(s - 2)(s - 3) ■ 


Let us write the analogous form of (27): 


(72) 


Hn-.u, = (^ - - ^*)*" 

= iizn:. + (^fjiizn-vAM.- M,) 

+ + ■■■ + iM .- a7.)»" 


Imposing the same condition as before that M x assumes its most probable 
value (i.e., M* = ^x), then (72) becomes 

(73) Mn:tu = fiznix 


fiUu, = = iiz:x 

The fcth moment of the distribution of w about its mean will then be 

/ijfc-liu, Muy 


(74) 


+ ( 2 )''.-2=“^* - ••• -I- (-1)*M‘ 

{k\. - ,{k\. 

— Ma*;* — \ 1 / ^**^~*-* ' I 2 / ’ 

— • * * + ( — I)* Ms:* 



102 


CHINGhLAl SEEN 


Since we assume a Type III distribution of the parent population, we have 
in accordance with the recursion relation (56) 

Mu> = MS:* = OJ:* 

M2:w = M6:* — M?:x = (15 + 687 + 307^) 

(75) M8:u> ~ M9:* — 3m 6:* M3;* “f" 

= a3:* (1215 + 64177 + 79387^ + 25207®) (rl 

fiiiw = Ml2;* — 4 m9:x fii-.x + fihz “ 

= (10395 + 4232257 + 27225997* + 58516837® 

+ 47922307" + 12474007®) 

Substituting into (71), we have 

i M»^ = Ma:* = OLz.x (^1 

\/ 

s -2r / s - 1 08 ., (1215 + 64177 + 79387= + 25207=) 

s - 2 y r(s - r) (15 + 687 +* 307 =)='= 

__ (s — 1) (s* + 5 — 6rs + 6r*) 
r(s — r) (5 — 2) (s — 3) 

9720 + 4175557 + 27079927® + 58403437® 

+ 47895307" +12474007® 

(15 + 637 + 307®)® 

6 s(r — 1 ) (s — r — 1 ) 
r(s — r) (s — 2 ) {s — 3)’ 

Allowing fi —> 00 , we have for an infinite parent population 
Mz^ = M 8 :* = « 8 :* O’J 

= <^1^1 (307= + 637 + 16) 

(77) _ 1 o,:, (1215 + 64177 + 79387=+ 26207*) 

VF (16 + 637 + 307*)*/= 

9720 + 4176667 + 27079927= + 68403437* 

• 1 + 47896307* +12474007* 

' r ' (16 + 637 + 307*)* 




FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


103 


The best approximated ^most probable value' of /xa.-x may now be written 
after the same fashion as in the preceding cases: 

-?- 2(1 + a..) 

where 


5*1/) 


2ot4:*«) — 3as:, — 6 




Since 


E (x - M.y 1 


= - (x - Mi + g [from (60)], 

^ \ 1 + 25^^/ 

and since we assume the best approximated ‘most probable values' of the 
standard deviation and the skewness for the standard deviation and the skew¬ 
ness of the parent population respectively, we obtain from (78) 

A A A i A S 

aZ A _ . Q (^z ^:x I 3 ^3:1 

m. + + ». 

(1215 + 6417-y + + 2520^») . 3 

^ (1 + 28^) (15 + 63i +'30^“) 

The change of /la:* to aa:* involves a systematic error although it is small. 
Again by proper substitution of (69) we have 


^3 03:1 

<t>Kaf.x, r, s) 


= <rl as; + 3 <t\ g 


oi-.x 


</>(a3:i, r, s) (1 + 26»J 


+ g^ 


3 -*3 

<r, Os:* 


g 


Solving for az-,., we have 


<f>^{az-.x, r, s) (1 + 26, 

az-.x <tI (1215 + 6417-y + 7938-?^ + 2520f») 
(aa:*, r, s) ■ (15 + 63-? + 30f“) (1 + 26,J‘ 


(79) 


08:1 


^*(^: 


A^Ti- 

ia:*, r, s) L 




3g <f>^ (az:x, r, s) _ 

(1 + 25, J (1 + 26,J’ 

g (1215 + 6417-? + 7938-?^ + 252(>?») ~| 
■** (1 + 2D (15 + 63”? + 3(>?*) J' 


Since the right member of (79) is a function of oa:,, r, and s, therefore the 
most probable value of as;* may be approximated when we are given s, r, and 
the skewness of an observed sample. As it is an algebraic equation of high 
order in as:* and is so much involved, even approximation presents practical 



104 


CHING-LAI BHSN 


difficulty. However, if once Ai:» is approximated, and can be easily ob¬ 
tained from (60) and (68). 

Theorem XXL For the best approximated 'most probable value^ of the skew¬ 
ness of a parent population which is assumed to be distributed according to 
Type III, we must approximate it from equation (79), in which the skewness of 
an observed sample is expressed as a function of s, r, and the best approximated 
'most probable value' of the skewness of the parent population. 

To construct a table for the best approximated 'most probable value' &8:x 
corresponding to as;, for particular values of r, s, we should first reverse the 
process by assigning different values of as..* so as to obtain a*:,; then by the 
way of interpolation, we shall be able to obtain for a particular oi;,. 

TABLE VII 

Relation of the Sample Skevmese and the Best Approximated *Most Probable Value^ 
of the Parent Population Whose Distribution is According to Type III 


(s —► 00 , r = 100) 


Ciz.t 

Ota:* 

.1 

.0784 

.2 

.1568 

.3 

.2373 

.4 

.3164 

.5 

.3969 

.6 

.4776 

. 7 

.5589 

.8 

.6410 

.9 

.7239 

1.0 

.8072 

1.1 

.8905 

1.2 

.9737 

1.3 

1.0667 

1.4 

1.1392 

1.5 

1.2211 

1.6 

1.3022 

1.7 

1.3791 

1.8 

1.4578 

1.9 

1.5355 

2.0 

1.6122 

2.1 ’ 

1.6828 

2.2 

1.7609 

2.3 

1.8303 

2.4‘ 

1.9024 

2.6 

1.9670 

2.6 

2.0371 





FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


106 


For fi —^ 00 and r « 100, we have computed the best approximated 'most 
probable value^ of as;* corresponding to the values of aj:, from .1 to 2.6 as shown 
in Table VII. 

The computation for such a table is laborious because it involves the compu¬ 
tation of 5,,, buyy and which are in turn functions of aa:*, and a 4 :»,, and 
oiA:gy , and at:»^ and a 4 :»«, respectively. 


Section IV. Distribution of the Hypothetical Means of the Parent 

Population 


Since we have obtained in the preceding sections expressions for the best 
approximated 'most probable values’ of the mean, the standard deviation and 
the skewness of a parent population which is assumed to be distributed according 
to Type III, we are now in the position to characterize the distribution of the 
hypothetical means of the parent population with the assumption that the best 
approximated 'most probable values’ of the mean, the standard deviation, and 
the skewness be the mean, the standard deviation, and the skewness of the 
parent population. 

Basing upon the fundamental relations in (15), we write down the character¬ 
istics of the distribution of the hypothetical means of the parent population as 
follows: 


Mmx = rni 


_ _ ^ ^ — r __ ./«“-I 

a,, a. y y 


(80) { 


«4:Jtf, — 3 = aA:zx 


s - 2r / s — 1 
s — 2 Y r(5 — r) 


Of3:x 


(s — l)(s^ s ~ 6r5 + 6r2) 
r{s — r){8 — 2)(5 — 3) 




6s(r — l)(s — r — 1) 
r(s — r)(s — 2)(s — 3 )* 


where ^(os:*, s, r) is given in (69). 

For an infinite parent population by allowing s —» 00 , we obtain from the 
above: 


(81) 


Mmx = 

1 


(TMx = 


«8;M, 


a/t ^(os-x, r) 
_1^ 

Vr 


Oti.x 


_ 3 a2 

““3 2^ ^S;x 


where ^(oa:*, r) = lim 0(a:«:*, s, r) 



106 


CHINQ-LAI SHEN 


Since we observe that the moments of the distribution of the hypothetical 
means are expressed in terms of osrx, it is therefore necessary for us to find the 
best approximated 'most probable value' of the skewness of a parent population 
before we attempt to obtain the frequency function associated with the distribu¬ 
tion of these hypothetical means. 

Numerical illustration. A sample of 100 weights of freshman students is 
observed and the frequency distribution is given in Table VIII. 


TABLE VIII 

Weights of 100 Freshman Students 
(Original Measurements Correct to Nearest Pound) 


Class Mark 

Frequency 

109.5 

4 

119.5 

11 

129.5 

25 

139.5 

34 

149.5 

14 

159.5 

8 

169.5 

0 

179.5 

3 

189.5 

1 


100 


The first four moments are computed 

mi = 138.3 
<r, = 14.6366 

aa:, = .81099 

a4:* = 4.47644 

Now, assuming this sample is drawn from an infinite parent population which 
is assumed to be distributed according to Type III, we wish to find (a) the best 
approximated 'most probable values' of the mean, the standard deviation, and 
the skewness of the parent population, and (b) the probability that the mean of 
the parent population lies between Jlf * = 135 and Mr = 140. 

By interpolation from Table VII, we obtain the best approximated 'most 
probable value' of the skewness of the parent population: 

a,:, = .6501 


From (69) and- (61) we obtain 
»r = 14.5452 

Mr = 138.25272, r) * 1.006279 



FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


107 


From (81) we have 

Mm:, = 138.3 

= 1.45452 

otZ’.Mx = • 06501 

a4:Jifx = 3.00633945 

= 0, the distribution of Af* is associated with T 3 rpe III Function; hence 
for the probability that Af * lies between AT* = 135 and Af x = 140, we again 
refer to Tables of Pearson’s Type III Function prepared by L. R. Salvosa,^® and 
we obtain in this case 

P = .8677592 

Since the determination of the best fit of a frequency curve in general depends 
upon the values of as, a 4 , and /c, and since in the present case each of them is a 
function of 5, r, and os:*, we are therefore not able to tell the type of curve to 
be used until we know s, r, and a 3 ;x. 

For the infinite case, however, as we have illustrated Type III Function may 
always be used because 

5 = ~ 3a^,, - 6 ^ ~ ^l:Mr ^ ^ ^ Q 

+ 3 + 3 

holds for all values of az:x and r. We therefore conclude that the hypothetical 
means of an infinite parent population which is itself distributed according to 
Type III is distributed according to Type III. Hence 

Theorem XXIL The hypothetical means of an infinite parent population is 
distributed according to Type III if the parent population is assumed to be 
distributed according to Type III. 

Section V. Distribution of the Hypothetical Variances of the 

Parent Population 

Parallel to Part III, Section V, the distribution of the hypothetical variances 
of a parent population which is assumed to be distributed according to Type III 
can be described. The fundamental relation of Theorems II and III hold: 


But now Mp 


f^n:p — M2n:^y Ur a2n:p — OC^nity 

M2n+l:7> = —M2n+l:*v «2n+l:p = — Ot2n+\\Zy 


E (x - 
r 


(See Part IV, Section II) 


( 82 ) 


Mp = 




+ g 


OzOi-.x Y 

1 + 2lJ 


= m* + gs 


2 a2 


(1 + r, s) 


[from (60)]. 


SalvoBa, L. R., Annah of Mathcfnatical Statistics Vol, I, No. II, 1930. 



106 


CHING-UlI bhsm 


Upon the same assumption that the best approximated ‘most probable values’ 
of the mean, the standard deviation and the skewness be the mean, the standard 
deviation, and the skewness of the parent population, the distribution of is 
characterised by 


(83)-^ 




_2 X* 


(l + 2«..)W8:.,r,s) 


= »ias 1 + 






(1 + r,s) 

3 


“ 7(7-% 




^*(a»:x, r, s) 


s — 2r / s — 1 

“*:*.:* “ s-2 y r(fi - 7) 

s - 2r / «- 1 30f* + 56-? + 8 

s - 2 y r(s - r) ■ (3-? + 2)< 

o_ 1 _ (s - 1) («* + «- 6rs) + 6r‘) f_ ,, 

^ r»\ i^ o\ l^4:y u) 


r(s ~ r) (« — 2) (s — 3) 

6s(r — 1) (s ~ r — 1) 
r(5 — r) (5 -• 2) (s — 3) 

_ (s - 1) (gi -I- g - 6rs + 6r») f 630-?^ + 16807'^ + 912-? + 48 ^ 


-r)(s-2)(s-3) L 


(3^ + 2)* 

6s (r — 1) (s — r — 1) 
r(s — r) (s — 2) (s — 3)‘ 


For an infinite parent population, we have 

1 


(84) 


= mil + 


A 2 
« 8 : 






- 


m2 


1 30 ^=» + 56 ^ + 8 
y/r (3^ -f 2)*'* 


„ 1 J'630-?‘ + 1680^* + 912-^ + 48' 

- r t-wTai- 


Numerical iUuatration. Using the same sample in Table YIII, we wish to 
ascertain the probability that the variance of the parent population lies between 



FT7NDAMBNTAL8 OF THEORY OF INVERSE SAMPLING 


109 


306.25 and 342.26. From nti » 138.3, <r< = 14.6366, a»:, == .81099, and = 
4.47644, we find from (84) 



214.232,236 


34.336,74 


-.496,311 

«4:A,:x = 

3.463,675,7 

= 

.105,515,6 


From Part I, Section III, 


k = 


4 «.,( 2 + «.,) 


= .276 < 1 


Therefore, the best fitting curve will be Type IV which assumes the form®* 


( 86 ) 


where 


y = J/o (1 + x*)-" e-^ 


X = 


I + p 


t being in standard units 


P = 
qi = 

m = 

X = 

yo = 


bi _ a» 

262 26 

4606, -b\ _ 46 (2 + 6) - al 
4b® 45* 

JL = 1 + 26 

262 6 

_ « + P 
biQ 


g {2m — 2, X) F{2m — 2, X) 


I/O is found from Pearson’s TabUe for Statisticians and Biometricians^' to be 
.049662. 


•• Elderton, W. P., op. oil., p. 64. 

’* Pearson, K., Tables for Statisticians and Bimnetricians, Vol. 1, pp. 126-142. 



110 


CHINO-LAI BHEN 


Now the given limits 30|B.25 and 342.25 of the variance, when expressed in 
standard imits, are 

ta = 2.679,941 
h = 3.728,410 

Therefore the probability that /i*;, lies between juj;, = 306.25 and m.,x — 
342.25 is 

f ‘ J-J.728, 410 

P = yo f (1 + e-^ dx 

we find 

m = 11 477,271 
X = 12 940,307 

/ 86343 

(1 ^ ^2)-11.477271 ^-12.W0207 tan-ix^3. 

08767 

By means of Maclaurin-Euler^s Interpolation Formula, P is found to be equal 
to .000,904. 

No definite law can be ascertained before we know asjx because, as we have 
seen, as-.;*,;* and a 4 :A,:* are both expressed in terms of r, and as:*. We do 
not know the value of fc, which is a determining factor of the best fitting curvT 

and a function of s, r, asiAj:® and a 4 ;/i,:x, until we know the values of s, r, 

and az:x • 


Section VI. Distribution of the Hypothetical Third Moments oi the 
Parent Population About Its Mean 

Recalling the fact that the distribution of the third moments of sample means 
about the most probable value of the mean of the parent population is equivalent 
to the consideration of a distribution of sample means drawn from a parent 
population, ici, u* 2 , iCs, • • • where Wi = (a:, — so we can write down in 
accordance with the fundamental relations stated in Theorems II and III: 


A^n:p — 




or 


^2nip 


<Xjn-|-l:p — — ^2n+l:*tp 


(x — M y 

But here Mp = ^; and by the substitution of (60), we have 

r 

= TWs + 3m2 


g (Te Otz.x 


+ 


9^cr]al:r 


(1 + ^(4*. r, 8) ^ (1 + r, s) 


(86) 



FUNDAMENTALS OF THEORY OF INVERSE SAMPLING 


111 


Consequently, with the same assumption that the best approximated ‘most 
probable values^ of the mean, the standard deviation, and the skewness be the 
mean, the standard deviation and the skewness of the parent population, the 
distribution of jSj;, is characterized by 


= ^8 + 3/^2 


g<TsOtZ:x 


+ 


3^3 

g 

X" 


(1 “1“ 26*^)</>(a8;x, r, s) (t + 25rj.)*0®(a8:x, r, s) 


- \/^) -- ’ <■' + + ’W’) ’■ 


r(s — 1) 


(15 4- 637 + 30?’“) 


r, S) 


_ s — 2 r ^ / .s — 1 
_ 2 y r(s - r) 

s - 2 r / s - 1 ^,;.(1215 + 64177 + 7938'?‘' + 2520^») 
s - 2 y r(,s - r) (15 + 634 + 304*)« 

(s - l)(s 2 + s - 6 rs + 6 r 2 ). , 6 s(r - l)(s - r - 1 ) 

* r(.s- - r)(s - 2 )(s - 3) ' r(s - r)(s - 2)(s - 3) 

9720 + 4175554 + 27079924* + 58403434* 
(s _ I)(s 2 + s _ 6 r.s + 6 r 2 ) + 47895304* + 12474004* 

r (8 _ r)(s - 2 )('s - 3) “ “ ' " - ' (15 ^ 634 + 304*)* 

6 s(r — l)(s — ?■ — 1 ) 
r(.s — r) (s — 2 )(s — 3)' 


For ail infinite parent population, we have 


= W 3 + 3wi3 


<r^a3.I 


+ 


1 


2r<l>{a3:x, r) 8r* <f>Ka3-.x, r) 


( 88 ) 


... = ]/l (!•' 


(15 + 634 + 304*) —— 

<^>*(a3:x, r) 

71 ^vx(1215 + 64174 + 79384* + 25204*) 


(15 + 634 + 304*)* 


« 4 :A.:x 



9720 + 4175554 + 27079924“ + 58403434* + 47895304* 
+ 12474004* 

(15 + 634 + 304*)* 


Numerical illustration. Using the same sample in Table VIII, we wish to 
ascertain the probability that the third moment of the parent population about 



112 


CHmO-IiAI BHEN 


the mean lies between jii-, k 3000 and jit;. <>c 4000, still assuming an infinite 
parent population from which the sample is drawn 

a,:. = .6601 

<l>(at:,,r) = 1.006,279 


We find from (88) 


Afp,., = 2668.137,096 
(rp... = 1676.696,37 
a,:p,, = -1.187,409,9 
04:^.:.= 6.127,661,6 

= 0 . 221,886 

k = 0.714,972 < 1 


Therefore the best fitting curve is Type IV. 

From Pearson’s Tables for Statisticians and Biometricians, Vol. I,“ we compute 


Vo = .000,058,032,3 


The given limits 3000 and 4000 when expressed in standard units are t = 
.263,689 and t = .860,466 respectively. Therefore the probability that As:* 
lies between 3000 and 4000 may be expressed by 


-r 

yf-.s 


-.860456 


(1 + X^) 


,-6.606819-17.44*447 


dx 


By means of Maclaurin-Euler^s Interpolation Formula, the answer is found to , 
be .267,408,631. 

We make the same remark here as we have made in the preceding two sections. 
That is, since and 04 ;^,;* are both in terms of s, r and oj.-*,, we cannot 

determine the value of h which is a function of a*:/*,.* and a 4 ;A«;» until we know 
the values of s, r, and oi;,. Consequently, the curve associated with the dis¬ 
tribution of the hypothetical third moments of a parent population of Type III 
distribution is not known until we know 5 , r, and oj;*. 


** Pearson, K., op. cit.j pp. 126-142. 





ON A METHOD OF TESTING THE HYPOTHESIS THAT AN OBSERVED 
SAMPLE OF n VARIABLES AND OF SIZE N HAS BEEN 
DRAWN FROM A SPECIFIED POPULATION OF THE 
SAME NUMBER OF VARIABLES 

By John W. Feetiq 

With the Technical Assistance op Margaret V. Leary* 

The problem of determiriing whether or not a given observation may be 
regarded as randomly drawn from a certain population completely specified with 
respect to its parameters is readily solved if the probability integral of that 
population be known. In particular if the population specified be a normal 
population, one may calculate the relative deviate {x — a)/(r, where a and <t 
are the population mean and standard deviation respectively, and refer to tables 
of the normal probability integral. The hypothesis that x was drawn from 
this population may be rejected if P is less than an arbitrarily fixed value, 
say ^ .01. Generalizations of this problem may be made in two directions: 
1) May a single observation simultaneously made on n variables be considered 
as randomly drawn from a specified population of n variables? 2) May a 
sample of one variable and of size N be regarded in its entirety as randomly 
drawn from a specified univariate population? 

The solution to the first problem for the case of sampling from a normal 
population of n variables was given by Karl Pearson in 1908^ as the “General¬ 
ized Probable Error. Let 

where a* and Ci are the population mean and standard deviation respectively 
of the variable, and is the usual eofactor of the element in the row 
and column of the determinant P of population correlation coefficients. 
That is, 

P ^ I P»J 1 i ~ 2, 3, • • • , 71. 

The probability of an observation yielding a smaller discrepancy than that 
represented by the value of lying between 0 and xS may then be 

calculated from Tables of the Incomplete Normal Moment Functions^ The 
tables are entered in terms of (x^)* and (n — 1), and the tabled value multi¬ 
plied by (2 t)* or 2 depending upon whether n be even or odd respectively. 

* From the Memorial Foundation for Neuro-Endocrine Research and the Research 
Servicfe of the Worcester State Hospital, Worcester, Massachusetts. 

113 



114 


JOHN W. FERTIG 


The probability of an observation giving a greater discrepancy is then the 
complement of this value. Obviously, this latter probability may be obtained 
directly by entering tables of the X® distribution such as Elderton's^ with n 
degrees of freedom, or through the use of Tables of the Incomplete F-Function^. 

The second problem, limited to the case of sampling from a normal popula¬ 
tion, was investigated by J. Neyman and E. S. Pearson in 1928^ The observed 
sample may be regarded as a point in JV-dimensional space, where N is the sample 
size. Criteria for the acceptance or rejection of the hypothesis may be asso¬ 
ciated with contour surfaces in this space, so that in moving out from contour 
to contour the hypothesis becomes less and less reasonable. Frequently, con¬ 
tour surfaces on which the mean or standard deviation is constant are used for 
the testing of this hypothesis. Such surfaces are deficient inasmuch as they 
are not ^‘closed” contours. Another contour system which appears more satis¬ 
factory is that of equiprobable pairs of m and s. The latter system in fact 
encloses roughly the same region as do the separate contours for the means and 
standard deviations. These systems are of course dependent on the particular 
statistics chosen to describe the sample and are further limited in that they do 
not take into account the probability of alternative hypotheses concerning the 
origin of the sample. 

Using the principle of maximum likelihood Neyman and Pearson have devel¬ 
oped a system of contours which is free of the above limitations. The system 
so derived is in fact quite similar to that of equiprobable pairs m and 5. In a 
later paper®, these same investigators have shown that this method of maximum 
likelihood does enable one to select the most efficient criteria for the testing of 
an hypothesis. The criterion selected on this basis is defined as 

^ _ Likelihood that sample came from specified population 

Maximum likelihood that sample came from some other population 

where a and a are the population mean and standard deviation respectively, 
and 5 and s the sample mean and standard deviation. 

\ is constant upon certain contour surfaces in iV-dimensional space, and dimin¬ 
ishes on passing outward. The form of the surfaces is independent of N, It 
is evident that X must lie between zero and unity. When it is close to unity 
we know that it is reasonable to assume that our hypothesis is true, when 
small we know that it is unreasonable. But we must know the probability of 
X less than a certain value occurring when the hypothesis tested is true, so that 
we may control another source of error, namely, that of rejecting the hjrpothesis 
when it is true. In other words, we must know the sampling distribution of X, 
so that we will reject the hypothesis only when the probability of obtaining a 
smaller value is negligible, say P\ g .01. Neyman and Pearson were not able 
to evaluate this distribution but they were able to integrate the original density 
function of the population appropriate to iV-dimensional space outside of the 



TESTS OF SIGNIFICANCE FOR MULTIVARIATE SAMPLES 


115 


various X contours. This they were able to do by effecting a transformation 
of the density function and contours to the plane of m and These values 
of P\ have been tabled by them’, the tables being entered in terms of N and 
ky where 

k = log p + _ log 

The generalization of either of the above problems requires a criterion to 
test an hypothesis which may be formulated as follows: Given a sample 2 of n 
variables and of size N with means fi, X 2 , • • • , Xn, standard deviations 
8iy 82 , • • • , 8 n, and correlation coefficients ns, • • • , rin, rjs, • • • , r^n, • • • , r(n-i)n, 
may we regard this sample as randomly drawn from a population tt of n varia¬ 
bles and completely specified with respect to all its parameters? We shall 
restrict our inquiries to the case where tt is a normal population. In this case 
the distribution law is 


f(Xi, X2, • • • , Xn) = 


1 

(2iryf^cri(T2 • • • (TnP^ 


where 


where and ax are the population mean and standard deviation respectively 
of the variable, and P and Pij are as previously defined. 

Thus the probability that 2 has been drawn from tt with its N values of 
Xia (i = 1, 2, • • • , n) Ijdng in the interval ± ^dxiai (« = 2, • • • , N) is 

given by 


_ r_1_1^' 

_{2Try^^aia2 - — anP^^ 


e‘^dX 


where 


_(27r)”^2(7'i<72 

0 = 5 P.7 S r ~ ~ 


2P 


^ j) 8tS, Tj/ -j“ (Xt Oy) 

t,/-l L ' - 


dX= U fl dXia 

t 1 a*" 1 

The likelihood that S has been drawn from any other normal population, 
such as jt', is given by 


r 1 Tiv 

_ _f_ tiX 

' "L(2,rrv;,r;...<r:p'‘j 



116 


JOHN W. FERTIG 


where 


A' _ ^ / e p' 


+ (x,. - a'i){Xj - 


The population from which it is most likely that S has been drawn is that 
for which C' is a maximum. The values of the parameters of this population 
may be obtained by putting 


^' = 0 
da', ’ 


^ = 0; (i = 1, 2, .. ■ , n) 


— = 0 ; (i, j = 1, 2, ... , n) 

op a 


These conditions are fulfilled when 


So that 


where 


a'i = X,-; <Ti = s<; (i = 1, 2, • ■ • , n) 
Pi] ~ ^ijr Or 3 ~ * * ' ) ^) 

C' -f ^ T.-nV/2 

L(2’r)"'*SiS2 • • • 


R = \ Ui\', t,i = 1, 2, • • • , n 

The appropriate criterion to select in order to test our hypothesis is thus 


X = ^ = SlSz • • • 

~ CLr [.Oiai a„P’‘\ 


where 


_ iV J " P.V r8<s, r<,- + {Xi - ai){xi - a,) "I 

2 V-.f-i P L J 


The equations X = constant represent a series of contours in iV-dimensional 
space. As we move outward from contour to contour our hypothesis becomes 
less and less acceptable. Although we may be confident that the use of this 
criterion will minimize the chance of accepting the hypothesis when it is false 
we must know the frequency with which samples occur outside of a given X 
contour when the hypothesis is true. In other words, we must know the inte¬ 
gral of C outside of various contours, or else we must know the sampling distribu¬ 
tion of X. The former is an exceedingly diflBicult method for n greater than 
unity. Thus for the case of n = 2 we should have to integrate some such 
expression as 

— rj,) * 



TESTS OF SIGNIFICANCE FOR MULTIVARIATE SAMPLES 


117 


outside of the various contours. Nor have we so far been able to evaluate the 
sampling distribution. We can however give an expression for the moments 
of X and thus reach an approximate distribution. 

Wilks® has derived expressions for the moment coefficients about zero for 
the maximum likelihood criterion that k samples of n variables and of Nt 
observations each have been drawn from the same unspecified normal popula¬ 
tion of n variables. Thus, 



from which we can write expressions giving the moment coefficients about zero 
for the X criterion for two samples 


= 


(iVi + N,) " 

nhN\ nhN% 

iV, * JVj ’ 


n 


n 

I -1 


/ 

r 

'Niil + k) 

-^1 

r 

^ 2(1 + h) — i 

r 

'Ni + N,- 


L 2 J 

2 

L 2 J 

, fT) 

Ir(^ 

- 1) 

|r 

■(ATi + iV,) (l+h) - f 

L 2 J 



The limit of this latter expression bs oo will be the moment coefficient 

about zero for the X criterion that one sample has been drawn from a specified 
population. Thus 


n 

Lim. iu*(X) = TT 
tJi 



w h S\ 



-nN iil+h) 

(1 + 


Various roots of X are distributed to a good degree of approximation according 
to a function of the form 


fit) = 


r(mi + rth) 
r(mi) r(TWs) 


lmt-1 (1 _ 



118 


JOHN W. FEKTIG 


where 

mi = - /i2)/(/^2 - Ml) ; mi = (1 - mi)wi/mi 

and the value of for roots of X may be obtained by replacing h in the original 
expression by h times the desired root. Measures of the skewness and kurtosis 
of this distribution are given by ^ 

Bi == 4(?ni — + m-i + l)/mimf(mi + mi + 2)^ 

Bi = 3Bi(mi + mi + 2) + 6(mi + mi + l)/2(mi + mi + 3) 

A comparison with the true measures of skewness and kurtosis for various roots 
of X as given by 

Bi = yl/yl ; Bi = 

will afford a measure of the goodness of the approximation and the range of 
values of A" for which any particular root will be distributed as assumed. 

Investigating the moments for n from one to four and N from three to fifty 
we note that in the case of samples of two and three variables, X^/^ follows the 
assumed distribution for N from 3 to 15; X*/^ from 15.to 30; X®/^ from 30 to 50. 
In the case of four variables, follows the distribution for N from 5 to 10; 
X^^^ from 10 to 20; X^^^ from 20 to 40; X*^^ from 40 to 50. It appears likely that 
for higher values of n, for N small, some such root as X^^^at qj, \ijzh follow the 
assumed distribution, while as N increases smaller roots will follow it. For 
any value of n, the smallest permissible value of iV is (n + 1). 

The probability that a smaller value of X will be obtained when the sample 
has actually been drawn from tt, i.e., Px, may thus be obtained by reference to 
Tables of the Incomplete B-Function® with p = mi, q = m^, x = value of the 
particular root of the observed X. We may also get the 1% and 5% levels of 
significance directly from Fisher^s^® tables of or Snedecor^s^^ tables of “P” 
(= by taking 


m = 2mi ; 712 = 2mi ; L = + n\F) , 

where L is the desired root of X. Linear interpolation will generally suffice 
except for very small values of iV. 

For the case of > oo, we have 

n-f-1 

- Lim. = (! + ;)) 

N-^90 

Thus the quantity ( — 2 log X) will be distributed in the distribution with 

n " 

S i degrees of freedom. 



TESTS or SIGNIFICANCE FOR BIULTIVARIATE SAMPLES 


119 


A table of the 1% and 5% levels of significance for n equal one to four, and 
values of N from five to oo is given below 


S% and 1% Levels of Significance of 
— N — 


n 


5 

10 

15 

20 

30 

40 

50 

00 

1 

5 % 

.025 

.037 

.041 

.043 

.045 

.046 

,047 

.050 

1 % 

.003 

.006 

.008 

.008 

.009 

.009 

.009 

.010 

9 

5 % X 10 

.046 

.173 

.234 

.269 

.308 

.330 

.343 

.392 


1 % X 10 -* 

.026 

.168 

.260 

.305 

.372 

.409 

.428 

.525 

q 

5 % X 10 -» 

.001 

.036 

.072 

.097 

.125 

.143 

.155 

.211 

0 

1 % X 10 -‘ 

. 000 ^ 

.019 

.047 

.076 

.101 

.117 

.128 

.194 

4 

5 % X 10 


.026 

.106 

.174 

.295 

.356 

.418 

.710 


1 % X 10 -* 


007 

,040 

.075 

.145 

.185 

.221 

.466 


A check on the accuracy of the method of approximation used may be obtained 
by comparing the values of P\ for the case of n = 1 with the exact values given 
by Neyman and Pearson. For n = 10, is distributed as assumed with 
mi = 9.0562, m 2 = 0.9987. For the case of {x — a)/a = 0.2, s/a = 1.2, we 
find k = 0.48439, X^'^ = .94395. From the Tables of the Incomplete B-Func- 
tion we find Px = .5936, from Neyman and Pearson's tables, .5935. 

No studies have been made on the extent of deviation from normality per¬ 
missible for the application of the test. There is no reason to doubt, however, 
that as much deviation is permissible as in the case of the univariate X. From 
theoretical considerations and from sampling studies Neyman and Pearson con¬ 
clude that the univariate X technique holds for deviation from normality to 
the extent of ±0.5 for Bi and 2.5 to 4.2 for B 2 . 

We are confident that this generalized X technique will be found useful in 
biological research. If the n variables were uncorrelated we would be able to 
test whether the sample had been drawn from the population of n variables by 
successive applications of the univariate X technique and then combining the 
resulting probabilities. In general, however, there will be some correlation 
between the variables, however slight. The method here proposed will take 
account of all possible intercorrelations, and consequently all multiple and 
partial correlations. 

Now, if Px is less than some arbitrarily fixed value, say g .01, we may decide 
which variable or variables contributes most to this result, by perfonning 
simpler X tests. It may be due to one or more of the means, standard deviations, 



120 


JOHN W. PERTIG 


or correlation coefl5cients. As may often be the case, it is not due to any one 
factor but to contributions from all of them. That is, all possible factors 
tested separately might show a fairly reasonable value of P, but if all the 
separate values are combined somehow, as by means of this X method, the 
resultant P may be too small. It is in such problems that this technique should 
provide valuable information. 

In case k samples of n variables are available it should be possible to deter¬ 
mine whether all of them have come from the same specified population of n 
variables by performing k X tests and combining the separate values of P\, 
Such a hypothesis may best be tested, however, by a further extension of the X 
theory which the writers are at present investigating. 

The following problem is chosen to illustrate the computations involved in 
the application of the test. Many of the investigations pursued at the Wor¬ 
cester State Hospital attempt to differentiate between schizophrenic patients 
and normal controls. In one such type of investigation various blood constit¬ 
uents were determined, namely, Urea Nj (mg./lOO cc.), Uric Acid N 2 
(mg./lOO cc.). Creatine Nj (mg./lOO cc.) for a sample of twenty-five schizo¬ 
phrenic patients. Previous investigations on these same variables for a large 
series of normal controls yielded constants which for the purpose of the 
example may be considered as the population parameters. Past studies on 
these variables have not shown any marked degree of non-normality for the 
various distributions. 

These variables are designated as 

1 = Urea Nj; 2 = Uric Acid N 2 ; 3 = Creatine N 2 

The parameters of the population are given by 

ai = 16.03 ; az = 1.40 ; 03 = 1.25 

al = 20.268 ; al = 0.029 ; al = 0.025 

P 12 — .3075 'f Pis = .1232 j P 23 = .3853 

The statistics for the sample of twenty-five are 

xi = 15.56 ; :r2 = 1.42 ; xs = 1.25 

si = 10.486 ; si = 0.043 ; si = 0.025 

ri2 = -.0161 ; ri3 = .0925 ; rzs = .2174 

None of these statistics differs significantly from the corresponding parameters. 

^ R = 0.9443 ; P = 0.7710 ; 

P„/P = -0.3373 ; P 13 /P = -0.0061 ; Pjj/P = -0.4506 ; 

Pii/P = 1.1045 ; P 22 /P = 1.2773; Pss/P = 1.1744 

IP = 12.5 (0.3802) = 4.7531 



TESTS OF SIGNIFICANCE FOR MULTIVARIATE SAMPLES 


121 


• (si el el R/a\ cr? o\ P) = 0.9001 
log X = 12.5 log (0.9001) - 4.7531 log e = 3.3641 
X = .0023 

Since the 5% level of significance is about .0001, we thus conclude that the 
patients are not differentiated from the control population with respect to these 
variables. 


REFERENCES 

1 . Pearson, Karl. Biometrika, vol. 6, 1908. pp. 59-68. 

2 . Tables for Statisticians and Biometricians, Part I. pp. xxiv-xxviii, 22-23. 

3. Ibid. pp. xxxi-xxxiii, 26-28, 

4. Tables of the Incomplete P-Function, 1934. 

5. Neyman, J. and Pearson, E. S. Biometrika, vol. 20, 1928. pp. 175-241. 

6 . Ibid. Phil. Trans. Roy. Soc. A, vol. 231, 1933. pp. 289-337. 

7. Tables for Statisticians and Biometricians, Part II. pp. clxxx-clxxxv, 221-223. 

8 . Wilks, S. S. Biometrika, vol. 24, 1932. pp. 471-494. 

9. Tables of the Incomplete Beta-Function, 1934. 

10. Fisher, R. A. Statistical Methods for Research Workers. Fourth Edition, 1932. 

11. Snedecor, G. W. Calculation and Interpretation of Analysis of Variance and Covari¬ 

ance, 1934. 



ON CONFIDENCE RANGES FOR THE MEDIAN AND OTHER 
EXPECTATION DISTRIBUTIONS FOR POPULATIONS OF 
UNKNOWN DISTRIBUTION FORM 

By William R. Thompson 

About the commonest situation with which we are confronted in mathematical 
statistics is that where we have a sample of n observations, {xi}, which is 
assumed to have been drawn at random from an unknown population, f/, with a 
zero probability that any two values in the finite sample be equal; and we 
desire to obtain from this evidence some insight as to parameters of the parent 
population, U. If further assumptions are made as to some of the parameters 
or the form of [/, there may result a gain in power in testing other given hy¬ 
potheses or establishing confidence ranges for particular parameters, but at an 
obvious sacrifice of scope in application. Insistent problems involve estimation 
of mathematical expectation that in further sampling we shall find x l 3 nng within^ 
a given interval, or similar expectation with regard to parameters of U such as 
the unknown median. It might seem that, without further assumption, all 
we should claim is that it is possible to draw from U the sample actually ob¬ 
served. A mere description of the experience may well be considered the 
observer's first duty, but a restriction to this would leave entirely unused the 
quality of randomness which has been assumed. What additional statements 
as to t/ may be appropriate in view of this randomness are our immediate 
concern; and the object of the present communication is to show how we may 
obtain such expressions in the form of mathematical expectations, and to 
present some results. Widespread applications to problems of estimation of 
normal ranges of variation or specific confidence ranges and comparisons of 
sample reflections of possibly different populations are immediately suggested, 
and a new foundation is offered for the study of frequency-distribution from the 
point of view of Schmidt.^ 

Section 1 

Accordingly, consider the following situation. Let A = [x] denote the set of 
all real numbers; and U denote an unknown frequency-distribution law of draft 
from [x] such that there exists an unknown function, f{x), bounded and not 
negative in A, and that the probability of obtaining x in an arbitrary interval 
(a, /3) is 

(1) P{a < X < $) ^ j f{x)-dx; 

^Schmidt, R., Annals of Math. Slat.y 6, 30, (1934). 

122 



CONFIDENCE RANGES FOR MEDIAN 


123 


and, for every positive p < 1, there exists a finite interval (a, fi) such that 
P{a < X < fi) > p. Let U be called an infinite population] and let n drafts, 
independently thus governed, made from A without replacements be called a 
random sample of n observations from J7. Let S = {x*}, fc = 1, • • • , n, denote 
such a sample; the enumeration to be made in an arbitrarily determined manner. 
In any case Xi 5 ^ Xj for i ^ j. 

Temporarily, let us consider k to indicate the order of draft of the values of 
[xk]f and let p* = P{x < Xk) denote the probability that x, drawn at random 
from Uj be less than Xk of S. The probability d priori (i.e., without regard to 
relative values of x in the sample) that in such random sampling pk lie between 
p' and p", where 0 ^ p' < p" ^ 1, is obviously independent of fc, and 
equals p" — p'; i.e., pk is equally likely d priori to lie in either of any two equal 
intervals in its possible range, (0, 1). Furthermore, the probability that in 
the rest of the sample, S, there will be just r values less than Xk is 

C 7 0 ‘ “ p‘)" 

where r is an integer and 0 g r < n. Of course, pk is unknown; but we may 
calculate (for all cases in repeated sampling wherein the same value of r is 
encountered) the expectation, Pr (p' < Pk < p")> that pk lie in the interval 
(p^ P^O- This is given by 

(2) Trip' < Pk < p") = - ' [ p' q>-dp, 

rlsi Jpf 

where s = n — 1 — r, and g = I — p. This is a familiar result^-in applications 
of the well-known principle of Bayes to estimation of d posteriori probability. 
The approach is convenient in that many relations which have been developed 
in this connection are made immediately available. However, that p* is 
equally likely d priori to lie in either of any tw o equal intervals in its possible 
range, is not based in the present case upon an especially added assumption 
nor any plea concerning equal distribution of ignorance, but follows directly from 
the elementary assumptions of random sampling. Accordingly, wt are enabled 
to develop for given ranges what may be called the specific confidence or mathe¬ 
matical expectation that a given variable lie therein. 

Obviously, (2) does not depend on k if this index is the order of draft provided 
that just r values of the sample, S, are less than the one under consideration, Xk. 
To simplify notation, accordingly, let the index k for any given sample, 


* Bayes, Philosophical Transactions, 5S, 370 (1763). Cf. Todhunter, I., History of the 
Mathematical Theory of Probability,” Macmillan and Co., London, 1865. 

* Laplace, “Th6orie Analytique des Probabilit^s,” Paris, 1820; and other works, Cf. 
Todhunter, l.c. 

^ Pearson, K., Philosophical Magazine, Series 6, Vol. IS, 365, (1907). 



124 


WILLIAM R. THOMPSON 


be determined by the relations, Xi < Xj for i < j, where k 
by (2) as A; = r + 1, we have 


1, • • • , n. Then, 


(3) 


P(r>‘ < P. < P") - ( t —- i)T f „ -- T) -! /.’ 


where pk is the probability that random sample values from U will be less than 
the fc-th value in order of ascending magnitude from a given random sample, 
[xh]y of n values from V] and P(p' < pk < p") denotes the expectation that in 
such sampling pk will lie in the interval, (p', p")* 

In general, let E{w) ss C) denote the mathematical expectation of any 
variable, w, under the given sampling conditions. Then, from a well-known 
relation developed by Laplace, we obtain from (3) the mean expectation of p*. 


(4) 


Pk = 


n + 1* 

and, further relations^ of Karl Pearson yield 


( 6 ) 


Ei(p,-P,y) = 


i.e., the mean squared error in systematic use of 


(n+l)*-(n + 2)’ 
k 


n + 1 


instead of the unknown 


Pk should have the value in (5). Specific confidence ranges for x are readily 
established; e.g., the expectation that in random draft from U we obtain x 
within the range {xk, a^n-t+i) in view of the sample, S, is 


(6) Tixk < X < x„-k+i) = ” ^ , for 2fc < n + 1 ; 

n 1 

k 

and P(x < Xk) = P{x > Xn-k+i) = -r. For a given variate, Wj the range 

n -f- 1 

(a, /3) will be called central if P(w < a) = P{w > jS), as in the case under (6). 
This is in accord with the development of the subject of confidence ranges by 
Neyman*'® and by Clopper and E. S. Pearson’ following the introduction of the 
notion of fiducial interval by R. A. Fisher.®*® The estimates of pk in (4) may 
be of value in studying frequency-distribution from the point of view developed 

by Schmidt,^ by comparison of Xk with ^ ^ rather than where 

^ is a univariant inverse of the integral of a given frequency function, taken to 


‘ Neyman, J., J. Roy, Slat. Soc.j 97^ 589, (1934). 

® Neyman, J., Annals of Math. Stat.^ 6y No. 3, 111, (1935). 

’ Clopper, C. J., and Pearson, E. S., Biomeirikay B6y 404, (1934). 
* Fisher, R. A., Proc. Camb. Phil. Soc.y 26 y 528, (19^)). 

® Fisher, R. A., Proc. Roy. Soc.y A lS9y 343, (1933). 



CONFIDENCE RANGES FOR MEDIAN 


125 


replace the unknown/(x). Obviously, T(xk < x < Xk^^ = —A discus- 

n + 1 

sion of the special case, n = 2, has been prominent recently in a controversy 
between Jeffrey^® and Fisher®*and in an article by Bartlett.^® 

Now, in (3) for p = p', and p" = 1; we may write^® 


(7) P(p < pfc) 



Iq{n — k + lyk) 


Bg{n — fe + 1, fc) 

Bi(n — fc + 1, fc) ^ 


where 7 = 1 — p, and the incomplete B and I functions are those of K. Pearson^* 
and Miiller.^^ Now, let M be the unknown median of the inl&nite population, 
[ 7 . Then, by definition of p*, if and only if Xk > Af, then pk > i- Therefore, 


(8) P(M < Xk) = P(0.5 < Pk) = = Ioi(n - k + 1, k ). 

, and the expectation that Af lie 
between the fc-th observations from each end of the set, S, is given by 


Obviously, P(xk < M < xa+i) = 


P(xk < M < Xn-it+i) = 1 — 2-Io^(n — k + ly k)y for 2k < n + 1, 


Obviously, this confidence range is central. 


Section 2 

Now, consider another infinite population, U'. In similar manner we may 
develop expressions for confidence ranges and distribution expectations. Let x' 
be the variate, and consider a sample. S' = {x,^}, of n' observations drawn 
without replacements from A according to U' but after the sample, S, of U; 
i.e., so that no two of these sample values in S' are equal, nor any of them equal 
to a value in S. Furthermore, let m be the order of ascending magnitude of x' 
values in S'; and pj, = P(x' < x^) for x' drawn at random from T', and let M' 
be the unknown median of U', Then, by replacement of x, n, pa, /:, and M by 
pL and M'y respectively, in relations already developed for U and S, 
we obtain corresponding expressions for U' and S'; e.g., 

( 10 ) Pixi <x' < • 


“Jeffreys, H., Proc. Roy. Soc., A 1S8, 48, (1932); A UO, 523, (1933); A 146, 9, (1934); 
Proc, Comb. Phil. Soc., 29, 83, (1933). 

“ Fisher, R. A., Proc. Roy. Soc., A IJtS, 1, (1934). 

“ Bartlett, M. S., Proc. Roy. Soc., A IJ^l, 518, (1933). 

“ Pearson, K., Biometrika, 16, 202, (1924). 

1* Muller, J. H., Biometrika, 22, 284, (1930-31). 



126 


WILLIAM B. THOMPSON 


Now, let the index values, be defined as the number of values of [zk] that 
are less than m = 1 , • • • , n'. Then, for all realized cases, 

(11) <xi< , m = 1, • • • , n', 

for the extreme members of ( 11 ) in S. Then, for x and x' drawn at random 
from U and ( 7 ', respectively, we may write 

( 12 ) 0 < (n + l)(n' + 1 )-P(x < x') — <n + n' + 1, 

m* 1 


provided that the expectations for U and U' may be treated as independent. 
Similarly, for P(M < M') we have the relations, 


(13) 


S - fcm + 1, U < < M'X 1 

m— 1 ' ^ 

+ • h.hijl — Ajm , Aw + 1) . 

w« 1 ' ' 


Oi course, h.h{n -f 1 , 0 ) s 0 , and /o.6(0, n + 1 ) = 1 . It may be verified readily 
that the inequality relations of (12) and (13) provide best upper and lower 
bounds for T{x < x') and T{M < M') under the circumstances given. ' I" 

Obviously, any increasing function, (t>{y)y for ym Ay may be used throughout 
the arguments, with <^( 2 /) replacing ^ = x, Xky My x', x,^, M\ respectively. 


Section 3 


Consider, now, the case of a finite population, Usy of real numbers {x^‘M> 
x(») < for i < jy i = 1, • • • , N. Assume that N is known, and that a 
sample, Sy of n values has been drawn at random from Un without replacements. 
Let the sample values be {xa:| , A = 1, • • • , n; and A be an arbitrarily determined 
index. As before, we might consider A the order of draft, temporarily, but the 
same analysis may be made if we let A be the order of ascending magnitude in 
the sample, S, and disregard its value in connection with d priori estimates of 
draft probabililiy. Each Xk = for some unknown Uk = ly • •: y N; and, 
d priori (i.e., with no knowledge as to order of magnitude of other values in the 
sample), any two of these values are equally likely. Obviously, this is so if Xk 
is the first value drawn from Usy and the rest of the sample may be regarded 
as a random draft without replacements of n — 1 elements from [Un — xi^. 
Let r be the number of these sample values less than Xky and s = n — 1 — r. 
Then the probability,of drawing such a sample after the given x*,, under the 


conditions given, is 


- /JV - i\ 


where — 1 is the unknown number of^ 



CONFIDENCE RANGES FOR MEDIAN 


127 


values in Un that are less than x*. To estimate the expectation, P{R = w* — 1), 
that there are just a given number, B, of values in Un less than Xk] we encounter 
the same situation considered by K. Pearson in a paper'® subsequent to those 
applied to the infinite universe; and, by a simple conversion in notation, we have 


(14) 


P(R = Uk — 1) = 



(15) 


In previous communications'® '^ I have defined a function, 

^ + r' — + s' + 1 + 


^(r, s, r', s') s 


/r + s + r' + s' + 2\ 
\ r + s + 1 ) 


for any four rational integers r, s, r', s' ^ 0; and shown that Pearsons further 
result, equivalent here to evaluation of P{uk ^ B + 1) for a given B, may be 
expressed by means of this i/'-function. Thus, we have 



P{Hk g B + 1) = rpivy s, B — r, iV — B — s — 2). 


It was demonstrated also'®that 


(17) 


\P(r, s, r', s') = i^(r, r', s, s') = t/'(s', r', s, r) = 1 - ^(s, r, s', r') 


with extension of the definition to include ^(r, s, —1, s') = 0, and that 


(18) 


i/'(r, s, r', s') 


/r + r'+ iXA + s' + l\ 

^\r+l + a/V / 

/r + s + r' + s' + 2\ 

\ r + s + 1 / 


As in the case of the infinite population, here also it is obvious that the order 
of draft of Xk is of no consequence in the analysis; and again we will let A; == r + 1, 
whence s = n — k, and we may make these substitutions in (14) and (16). 
Then, we may write 

(19) P(uk ^ B) = \p(k — 1, n — fc, B — A;, fc + AT ~ B — n — 1); 


'* Pearson, K., Biometrika^ SO A, 149, (1928). 

Thompson, W. R., Biometrika^ S5, 286, (1933). 

Thompson, W. R., American Journal of Mathematics ^ 57 y 450, (1935). 



128 


WILLIAM R. THOMPSON 


and, obviously, ^ N — R + 1) ^ F(ut g B). Hence, if we let M 

be the unknown median of Unr; and m s —II-?, where o = 0, 1, and N — a is 
even; then, as w*- is an integer, 

, ^ ^ M g a:n-ifc+i) ^ P{ w* g ^ g Mn-k+i) 

(20) \ 2 / 

s 1 — 2*^(fc — l,n — fc, w — fc, fc + iV—m — n — 1), 


which is the expectation that the median of J7jsr lie within the closed interval, 
(xjb, a^n-fc+i), for 2A; ^ n + 1. This gives the confidence range, analogous to 
that for the infinite universe. It may be noted that 

P(ujb ^ R < Uk+i) = P(uk ^ B) — P(uk+i ^ R) 

= rP(r, s, r', s') - ^(r + 1, s - 1, r' - 1, s' + 1) 


where r = fc — 1, s = n“fc, r' = fi — fc, and s' = fc + iV'— 72 — n — 1. 
Hence, (18) gives 


( 21 ) 


P(uk ^ R < Uk+i) = 



The approach by way of Pearson^s problem again makes it easy to evaluate 
the expected mean p* and variance as in the case of the infinite population, 

fc X 

where pk = P(x < Xk) for x drawn at random from t/jsr. Of course, pk = , 

but Uk is unknown. From Pearson's result,^* however, we obtain 


( 22 ) 


k(N +1) — n— 1_ k /- n\ /b—1 

N{n + 1) iThM \ N/ "1^ ' 


and the expected variance of 


(23) 



E((pk - Pk)^) = 


k(n — fc + 1){N + 1)(7V' — n) 
(n + l)^-(7i + 2) -N^ 


Yale University. 



THE SAMPLING DISTRIBUTION OF THE COEFFICIENT OF 

VARIATION 


By Walter A. Hendricks with the assistance of Kate W. Robey 
National Agricultural Research Center, Beltsville, Maryland 

The coefficient of variation does not appear to be of very great interest to 
statisticians in general. However, its use in biometry is sufficiently extensive 
for some knowledge of its sampling distribution to be desirable. The present 
paper is an attempt to satisfy this need. 

For the purposes of the following discussion, the coefficient of variation may be 
defined as the ratio of the standard deviation of a number of measurements to 
the arithmetic mean: 


1 - = -..( 1 ) 

X 

As is well known, the probability that the mean of a sample of n measurements, 
taken at random from a normal universe, lies between x and x + dx and that the 
standard deviation of the measurements in the same sample lies between s and 
s + is given by the relation: 


dF,, 


9 



ds 


( 2 ) 


If equation (2) is expressed in terms of polar coordinates by means of the 
transformation: x = p cos 0; s = p sin 0, it becomes a distribution function of p 
and 0 in which 0 = arc tan r: 


dF^,, 



pti-i sinn-2 0 dp d0.. (3) 


In equation (3), p may vary from 0 to « and 0 may vary from 0 to ir. To find 
the distribution function of 0, all that is necessary is to write: 


dF, = 




^n-l 


dp do 


(4) 


129 






130 


WALTEB A. HENDRICKS 


in which, 


k = 


n*" 


2*-i TT* 


*“ TDin D . _ ^ 

e sin"-* 


a = and h = — m cos 6, 

2* <T 2‘ (T 


and to perform the indicated integration. • 

To evaluate the integral inside the brackets in equation (4), we may write: 


n-l 

V 




(n - 1)! 


^ (n - I - i)l a 


du. .. .(5) 


Consider the integral, j du. If 6 is sufficiently large, as is the 

case when the parameters of equation (2) are of such magnitude that practically 
the entire volume under the frequency surface lies to the right of the s axis, that 
is to say, if negative and small positive values of x occur so infrequently that their 
effects may be neglected, the lower limit, —?), of this integral may be replaced 
by — 00 without introducing any appreciable error. The value of the integral. 


/: 


is zero when n — 1 — z is odd and FI 


W) 


when n — 1 — i 


is even, zero being counted as an even number. 

Subject to the above condition that 6 be sufficiently large, we may, therefore, 
write equation (5) in the form: 


/• 


g-(.p-6)‘ p«-l dp = — 

a" 


V/ (n - 1)1 
(n-l - i)\i\ 



.( 6 ) 


in which the symbol, 2', indicates that the only terms entering into the surama- 
tion are those in which n — 1 — i is an even number. 

Substituting this expression for the integral inside the brackets in equation 
(4), replacing k, a, and b by the quantities which they represent, and writing V 

in place of the ratio, —, we obtain the following distribution function of 6: 


dFe = 




-ain 

sin 


»-i (n- 

. , \ 2 / n‘' 

O 7 ----—-——- 


■0 


(n — 1 — t)! i\ 2*‘ V' 


, cos* e de . (7) 


Equation (7) may be written in terms of v, if desired, by making the substitu' 
tion, 0 = arc tan t>: 



SAMPLING DISTBlBtTTION OF COEFFICIENT OF VARIATION 


131 


= 




g 2F« l + ti« . 


(1 + t;2)in 


«-i (n — 1)1 
S (n - 1 - 


KvO I 


i)l i\ 2*' F‘ (1 + 


cfy. . (8) 


It must be emphasized that equation (8) has been derived on the hypothesis 
that negative and small positive values of £ occur so infrequently that they may 
be neglected. However, since this condition is satisfied in the vast majority 
of practical problems in which the coefficient of variation is likely to be used, 
the limitation is not of much practical importance. 



0 0 01 02 03 04 05 06 07 08 09 

VALUE OF V 

Fig. 1. Observed and Theoretical Distributions of Values of v for 512 Samples 
OF Numbers of Heads Appearing in Two Successive Tosses of Ten Coins 


As a test of the validity of equation (8), the authors calculated 512 coefficients 
of variation of the numbers of heads appearing in two successive tosses of ten 
coins. The coins were tossed 1024 times, thus yielding 512 samples, each con¬ 
sisting of two obsen^ations. For these data we have m = 5, <7 = 1.581, and 
V = 0.3162. 

For the case, n = 2, equation (8) reduces to: 


dF, 



i 

F* l-fv* 


dv 


(9) 


Figure 1 shows the distribution of the 512 values of v obtained from the coin 
tossing experiment, together with the theoretical distribution given by equation 
(9). 

An inspection of Figure 1 indicates that the agreement between the observed 




132 


WALTER A. HENDRICKS 


and theoretical frequencies is fairly good. An application of the familiar chi 
test for goodness of fit showed the agreement to be rather poor. According to 
this test, the degree of discrepancy between theory and observation could have 
arisen by chance less than once in a hundred trials. However, the discrepancies 
may be partly due to the fact that data distributed in a discrete fashion were 
treated by methods appropriate to the analysis of data distributed according 
to a continuous frequency curve. 

As another test of the validity of equation (8), the authors calculated 149 
coefficients of variation of ^‘days to maturity,” which is the length of time elaps¬ 
ing between the date of hatch of a chicken and the time egg production com- 



VALUE OF V 

Fig. 2. Observed and Theoretical Distributions or Values of v for 149 Samples 
. OF “Days to Maturity” in Rhode Island Red Pullets for Samples of Two 

Observations 

mences, for samples of two observations made upon Rhode Island Red pullets. 
Figure 2 shows the observed distribution of the 149 coefficients of variation, 
together with the theoretical distribution given by equation (9). 

In applying equation (9) to these data, the parameter, V, had to be evaluated 
from the data. The best estimates of the values of m, a, and V which could be 
obtained from the 298 measurements of '‘days to maturity” are m = 210.477, 
cr = 18.6991i V = 0.0888415. The theoretical distribution shown in Figure 2 
is based on this v^ue of V. 

The agreement between theory and observation shown by Figure 2 is very 
good. In this case, the chi test showed that the degree of discrepancy en¬ 
countered could have arisen by chance about six times in ten trials. 



SOME NOTES ON EXPONENTIAL ANALYSIS 

By H. R. Grummann 

Assistant Professor, Department of Applied Mathematics, Washington University 


M. E. J. Geuhry de Bray in his charming little book “Exponentials made 
Easy^^^ tells how to determine the constants in the equation, 

(I) y = Aie^^^ + A2e^ 


so that the curve will pass through four points, with equidistant ordinates on 
an empirical curve. If (Fig. 1) 2 / 0 , l/i, 2/21 and 2/3 are the equidistant ordinates 
and 5 is their common separation, i/o being the y intercept of the curve, de 
Bray’s formulsis are: 


(II) 



02 = 


log Z2 
d 


where Zi and Z 2 are the roots of the quadratic equation 


(III) 


2* 

Z 

1 

?/3 

2/2 

2/1 

2/2 

2/1 

Po 


The coefficients Ai and A 2 of the two exponential terms are obtained by solving 
the two simultaneous equations 

Ai A 2 = 2/0 

(IV) AiZ] A 2 Z 2 = 2/1 


In attempting to find suitable empirical equations for some “river rating 
curves”—graphs of discharge versus stage—the writer tried to make use of 
de Bray’s procedure. The original intention was to use the above method to 
determine the constants, and then to correct these constants by the use of 
Least Squares, as done by J. W. T. Walsh* in an application of the method to 
a problem in radioactivity. It often happens that a series of plotted obser¬ 
vations suggest a simple exponential function, but that when the observations 
are replotted on semi-logarithmic paper a straight line is not obtained. Often, 
as in the case of a good many river rating curves, the result may be described 


^ Macmillan & Co. Ltd., St. Martinis St., London W. C. 2. 

* Proceedings Phys. Soc. London XXXII. This reference is given by de Bray in his 
book, “Exponentials made Easy.’’ 


133 



134 


H. R. GRUMMANN 


as ''almost straight.” At first blush it might seem that in all such cases it 
ought to be possible to fit a curve with equation I to the data by de Bray^s 
Method. By an easy generalization of the above formulas, the constants in 
an equation with three or four exponential terms could be determined if two 
terms were not enough to secure a good fit. 

It was soon found, however, that innocent looking monotonic curves without 
points of inflection plotted from data that gave an "almost straight” line on 
semi-logarithmic paper quite often led to a quadratic equation, (equation III) 
whose roots were not both positive numbers. 


Y 



If Zi and Z 2 f the roots of III, are complex conjugates, it may be seen from IV 
that Ai and A 2 will be complex conjugates. Also, ai and (h will be conjugate 
complex numbers and may be calculated as follows: 

Let Zi = and 22 = re""*® 
then from equation II, 

re‘® = e“^% 


whence, by division to eliminate r we have 

• ^2id — g5(ai— 02 )^ Qj. 


(Va) 


2ie 

— = ai -- 02. 



SOME NOTES ON EXPONENTIAL ANALYSIS 


135 


Also, by multiplication to eliminate 


7-2 = ^iiai+at) ^ qj . 


(Vb) 


2 log r 
5 


ai + 02. 


The sum and difference of the two a*s being obtained by these expressions, 
one may solve for O] and 02 . 

L/et Oi = X -f- t/x Ai = O' -j- ijS 

02 = X — LfJL A 2 = a — L0 

Then equation I becomes 

?/ = (a + + (a — 

y = 2e^^[a COS fix — sin fix]y or 
(VI) y = cos (fix + c) 

_ o 

where R = + /3^ and tan c = - . 


If one of the roots of III is negative, the de Bray formulas II and IV will 
still give an expression for equation I which formally reproduces yo, 2 / 1 , 2 / 2 , and 
2/8 when 0, 5, 25, and 36, are substituted for x respectively, but which is useless 
for interpolating and of no value as a solution of the curve fitting problem. 
Suppose, for example, that Zi is positive and Z 2 is negative. Then 


22 = (-1) I 22 1 


and 


log 22 = log (-1) + log I 22 1. 

Equation I then becomes 

2 / = (-l)‘.4je » , 

X 

the factor (—1)^ being real only when z is an integral multiple of 5. If the 
( — 1) is written we have 

T I X X log I Zt i 

y = ^ A 2 € * , or 

A aix I A "" *^ ~ a"**' r TTX . Trrl 

y = Ai€ + A 2 C ^ cos — + I sin — . 

Neither the real nor the imaginary part would be a graduation function for a 
monotonic curve as each has a half period of 6. 

The expression for I is similar, and of no greater practical value, if both of 
the roots of III are negative. 



136 


H. R. GRUMMANN 


Without loss of generality we may let i/o = 1, n 
the quadratic III becomes 


^ r, = ^ r, = Then 
2/0 2/1 2/2 


2^ 25 1 

nn n l = o, or, 

ri r2 r\ 1 


written in the form 


2:2 + + ^ = 0, j e., 


(Ilia) 


, rir2(r8 ~ ^ 2 ) ^ 

Z H--r— — 0. 

(^2 “ ri) 


Hence the roots of this quadratic are real and unequal if D > 6, equal if D = 6, 
and complex if Z) < 6, where 


’rs _ 

1 

Cl 

CO 

1 

+ 4 r -' 

+ -1 



L ''2 



From the point of* view of the computer, however, it is about as much work 
to calculate D as to solve the quadratic equation. 


P 



Fig. 2 


Reverting to equation Ilia; suppose the numbers q and p are plotted as the 
coordinates of. a point (g, p) as in Fig. 2. Then the parabola p^ = 4g is, so to 
speak, a locus of equal roots. The remainder of the figure requires no expla¬ 
nation. 

Suppose that all the r's are positive, as they would be in the case of a simple 
monotonic curve which one proposed subjecting to an exponential analysis. 



SOME NOTES ON EXPONENTUL ANALYSIS 


137 


If g < 0, the quadratic will have one negative root. Now 

, r,r.h - ,,) 

(r2 - ri) 

for g' < 0, if r 2 > ri, then ra < r 2 and consequently ra < r 2 > Vi and if r 2 < ri, 
then ra > ^ 2 , or ri > r 2 < fa. Also, provided p 2 > 4g, a positive p and a pos¬ 
itive q will give two negative roots. But 


r2(ri - ra) 
(r2 - ri) 


and p and q can not both be positive when all the r’s are positive as this implies 
either that r 2 > n, n > ra and ra > r 2 , a contradiction, or else that r 2 < n, 
ri < ra and ra < ra, also a contradiction. Hence if both roots are negative, the 
r's can not be all positive. The case of two negative roots will not arise in 
trying to fit equation I to a monotonic curve, since if all the r^s are positive 
both p and q can not be positive. 

For all r^s positive, provided > 4g, a positive q and a negative p will give 
two positive roots. But 


and 


^ nraCra - ra) 
^ in - r,) 


> 0 , 


r2(r3 - ri) 
{ri - J-i) 


> 0 


means that ra > ra > ri or ra < ra < n. 

To sum up: If all the r\s are positive, de Bray^s method of exponential 
analysis is possible (a) when D < 6 and the roots of III are complex; (b) when 
Z) > 6 and n > ra > ra or when n < ra < ra. 

Figure 3 gives a picture of the second condition (b) of the preceding para¬ 
graph. Suppose an exponential curve is passed through the first two points 
on the empirical curve with ordinates yo and . Its equation will be: 


X X 



Suppose also that t/a is less than the ordinate to this curve when x = 25. Now 
pass an exponential curve through yi and pa using a new axis of ordinates 
coinciding with yi. Its equation is 


y = yi 



X 

J 


2/1^2 , 



138 


H. R. GRUMMANN 


or referred to the original axis: 

x—i 

y = yin * 

Now if the graduation is possible without using trigonometric functions, yz 
must be less than the ordinate of this second curve when x = 36. 



Fig. 4 


It is natural to inquire if the state of affairs is not similar to this, for the 
cases of fitting curves with equations similar to I but having three or four ex¬ 
ponential terms on the right hand side instead of only two. If three terms are 
used (see Fig. 4) to find constants in 

(la) 


y = + Ase™ + A»e*** 



SOME NOTES ON EXPONENTIAL ANALYSIS 


139 


it is first necessary to find the roots of the cubic 


(Ilia) 


7? Z 1 


m = 


Vh 2/4 2/3 2/2 
2/4 Vz 2/2 2/i 


2/8 1*2 2/i 2/o 


Now, /(x) will have no negative roots if /( —x) has no changes of sign. But 
writing the conditions that the cofactors of the elements of the first row in 
the above determinant have the same signs, and assuming that all the t/'s are 
positive, one does not get a series of conditions analogous to tz > > n or 

Tz < n < Ti, 

In the following, formulas will be derived for finding the constants in equation 
la after the roots of Ilia have been determined. Also formulas will be obtained 
for finding the constants in 

(Ib) y = Ax + A 2 + Az + A 4 

after the roots of 


(Illb) 


2 ^ 


z^ 

z 

1 

2/7 

2/6 

Vh 

2/4 

2/a 

2/6 

2/6 

Vi 

2/a 

2/2 

2/6 

2/4 

y% 

2/1 

2/1 

2/4 

2/a 

yi 

2/1 

2/0 


have been found. Both sets of formulas have been tested by an “exponential 
analysis’’ of the same body of data, viz., the very accurate recent determina¬ 
tions by the U. S. Bureau of Standards of the saturation pressure of water 
vapor above lOOC.® 

For the case of three exponential terms in the graduation function, the o’s 
are found by formulas like II or V, after the roots of the cubic are found. If 
Zz are the roots, the A’s are obtained by solving the simultaneous equa¬ 
tions 


Ai -|- Az A3 = 2/0 

(IVa) AiZi + AzZz + As 28 = Vi 

AiZi^ “b A2Z2 + A3Z8 = 2/2 


* Osborne, Stimson, Fiock, and Ginnings: The Pressure of Saturated Water Vapor in 
the Range 100° to 374°C. Bureau Standards Journal of Research, Vol. 10, Febr. 1933, 
page 178. 



140 


H. R, QRUMMANN 


This presents no new difficulty unless two of the roots are conjugate complex 
numbers. In this event, if we let Zi = the real positive root, Z 2 = r and 
Zt = r the determinant D of the equations IVa may be written 


1 1 1 


D = 


Zi 




or, expanded in terms of the elements of the first column and their minors, 

D = 2i[zir^ sin 2^ — (r® -f Zir)8in 0], 

a pure imaginary. Similarly, 

Ai D — 2i[r‘^yi sin 26 — {y^r^ + 2 / 2 r)sin 0], 

also a pure imaginary, so that is real. Having calculated , it is substituted 
in the first two of equations IVa, which are then solved for and Az. a% 
and az are then determined by formulas Va and Vb, replacing the subscripts 1 
and 2 in those formulas, by the subscripts 2 and 3 respectively. Finally the 
two exponential terms corresponding to the complex roots of the cubic are 
combined into a single trigonometric term as in equation VI. 

The necessary formulas for the case of four exponential terms in the gradu¬ 
ation function will be discussed briefly. The equations 


Ai A 2 Az Aa = y^ 

AiZi + ^222 4- ^32^8 + ^424 = 2/l 

(IVb) 

A\z\ -j" ^42^2 “I- Azz\ -f- .4424 = i /2 

Aiz\ 4" -4.222 4" 43 Z 8 4" 442 ^ = yz 

have to be solved for the 4.’s. The are the roots of Illb. Two cases will 
be considered: First case: Zi and Zz are complex conjugates and Zz and 24 are 
complex conjugates. Second case: Zi and 22 are complex conjugates and 23 
and 24 are real and positive. In either event 4li and 4 l 2 are complex conju¬ 
gates, as will be proved below. Formulas for 4i are given for both cases. 
Then Az is known since it is the conjugate of Ai. Having found Ai and Azy 
let 


Co = 2/0 — {A\ 4* 42 ) 

Cl = 2/1 - ( 4 i 2 i 4 - 4222 ) 


Both Co and Ci are then real. To get 48 and 44 solve the equations: 
' 43 4 " 44 = Co 

4828 + AiZi = Cl 



SOME NOTES ON EXPONENTIAL ANALYSIS 


141 


A pair of exponential terms with conjugate complex coefficients will then be 
expressed as a single real trigonometric term as in VI. 

The determinant of equations IVb may be written 

(VII) Z) = ( 2 i - 22 ) ( 2 i - Z3)(Zi - 24 ) ( 2:2 - “ Zi)(Z3 - Zi). 

First case: Let 21 = a + 16 , 22 = a — t 6 , 23 = a + t/ 8 , 24 = a — t/3. Then D 
may be written 

(Vila) D = -4^6[(a - a)^ + (6 - ^)^] [(a - a)^ + (b + fin 
which is real. Now 



2/0 

1 

1 

1 


1 

2/0 

1 

1 


2/1 

22 

28 

24 


Zl 

2/1 

28 

24 

AiD + AiD = 

2 

2 

2 

+ 

2 

2 

2 


2/2 

22 

2^8 

24 


2i 

2/2 

28 

24 



8 

3 

-3 


3 


8 

8 


2/8 

22 

23 

24 


2i 

2/8 

28 

24 


0 2/0 1 1 

1 2/1 Zi Zi 

(21 + 22) 2/2 ^8 

(zl + 2i22 + Zl ) 2/3 Zl Zl 


and this is real since (zi — 22 ) is a pure imaginary and the minors of the real 
elements of the first column of the determinant are all pure imaginaries. Hence 
AI and A 2 arc complex conjugates since when each is expressed as a quotient 
of two determinants by Cramer’s rule, the sum of the two numerators is real 
and the common denominator is also real. 

For purposes of numerical calculation Ai may be obtained from 


A, = 


NP 

D 


in which D is obtained from Vila, 

V = 2/3 - (22 + 23 + 24)2/2 + (2223 + 2224 + 2324)2/1 - (222324)2/0, 

and P = (22 — 23) (22 — 24) (23 — 24) 

= 2fi[{a — a)2b + t{(a — ay + (fi^ — ^ complex number. 

If 2 j 22 = r® and 2324 = the symmetric functions of the 2’s in the above formula 
may be calculated from 

222324 = (a — t6)p2 

2323 -j- 2324 -|- 2324 = p^ “I" 2<x(a — ib^ 

22 + 23 + 24 = (u — ib ) + 2 a 



142 


H. B. ORUMMANN 


For the second case, which is exemplified by the vapor pressure data, 
(Vllb) D = 2ib[(a - 23 )* + b^] [(a - 24 )* + b^] [ 2 , - 24 ], 
a pure imaginary. The sum of the two numerators of .4i and Aa, namely 

0 yo 1 1 

1 J/l 28 24 

2l + 2* 2/2 zj Z* 

+ 2 i 22 + zl yi z\ Z\ 


(Zl — Zi) 


Z*l 


is a pure imaginary, since (zi — Z 2 ) has this character, and the determinant 
haa nothing but real elements. Hence Ai and Aa are still complex conjugates 
when Zb and Z 4 are real, z, and Z 2 being complex conjugates. 

For purposes of numerical calculation A^ may be obtained from 


(z, - Z 2 ) (Zi - Zi) (Zi - Z4)' 


Here (zi — Z 2 ) is a pure imaginary and the other three factors are complex. 

Let N = ri(cos 0 i + i sin 61 ) 

2i — 2s = r 2 (cos + t sin $ 2 ) 

Zl — Zi = rsCcos $3 + I sin ^ 3 ) 

Then 

^ _ ri [cos (di — $2 — 63 ) + I sin (61 — fe — ^s)] 

' ’ (21 — 22) ra rs 

In calculating N by the formula given for it in the preceding paragraph, the 
symmetric functions of the z’s were obtained from 

222824 =r (a — i 6 )Zs 24 

2228 + 2224 + ZiZi = (o — 16 ) (Zs + Z4) + Z 824 

22 + 28 + Z4 = (a — lb) + 23 + 24. 


Example 

The first two of the following tables are abstracted from Table 2 , p. 178 of 
Bureau Standards Research Paper No. 523. The third table is abstracted from 
Table 3, p. 179, et. seq. of that publication, x is the number of degrees centi¬ 
grade above 100 °. y is the pressure of saturated water vapor in International 
Standard Atmospheres. In the first two of the following tables, the values 



SOME NOTES ON EXPONENTIAL ANALYSIS 


143 


of y are observed values. In the third, they are interpolated or graduated 
values calculated at the Bureau of Standards. 


TABLE I 



TABLE III 


X 

y 

0 

1.0000 

39 

3.4666 

78 

9.4490 

117 

21.612 

156 

43.392 

195 

78.974 

234 

133.64 

273 

215.37 


The observed values of y in Table I are reproduced by the following formula 
used in conjunction with a standard six place table of logarithms and trigono¬ 
metric functions: 

(I) ' 2/ == 3.967433 c cos (.4085758a: - 75®24'03".7). 

The observed values of y in Table II are reproduced by the following formula 
used in conjunction with a standard six place table of logarithms and trigono¬ 
metric functions. 


(ID 


y = 3.0253744 

+ 2.2171657 6 cos (155°59'35".5 - 0.7899232a:). 








144 


H. R. QRUMMANN 


Hence the formula is presumably an excellent one for interpolation between 
the values of y listed in Table II, if the greatest accuracy is not needed/ 

The values of y in Table III are reproduced exactly to five significant figures 
by the following formula used in conjunction with a standard six place table 
of logarithms and trigonometric functions. 

y = 3.8902543 - .164787 

+ 2.743000 € 009884290* eos (.7860725X + 186°28'53".2). 

By means of this formula the saturation pressure of water vapor was calculated 
for every five degrees from 100°C to 370°C in order to make comparisons with 
the corresponding ‘^smoothed” values in Table 2 of the Bureau of Standards 
publication referred to above. The discrepancies were never more than one in 
the fourth significant figure and generally less. The poorest agreement was 
in the ranges of temperature from 100®C to 135°C and from 245°C to 270°C. 

It is a pleasure to acknowledge the intelligent and painstaking assistance of 
Mr. G. D. Lambert, undergraduate student at Washington University, for doing 
most of the computing. 

Washington University, 

St. Louis, Mo. 


^ The values of y in Table III (not counting the value of y for x = 0) are reproduced 
by it with an average error of .13% and a largest error (for x 234°) of .30%. Four of 
the errors are negative and three positive. 



ON THE FBBQUENCY DISTRIBUTION OF CERTAIN RATIOS 

By H. L. Ribtz 
U niveraity of Iowa 


Considerable interest in the distribution of ratios, t = y/x, has no doubt 
been suggested by important applications. For example, we may mention the 
opsonic index in bacteriology, the ratio of systolic to diastolic blood pressure 
in physiology, and ratios such as link relatives or certain index numbers in 
economics. 

In 1910, Karl Pearson* gave certain properties of the distribution of ratios 
by means of approximate formulas for moments up to order four in terms of 
means, variances, product moments, and coefficients of variability of x and y. 
The resulting formulas did not give, with suMcient accuracy, the constants of 
the distribution of the opsonic index for the purpose of Dr. Greenwood to whom 
Pearson attributed the derivation of the formulas for the special case in which 
X and y are uncorrelated. Pearson next adopted the plan of tabulating the 


reciprocals, say x' = and then finding the constants of the distribution of 

the product yx' in the case in which x' and y are uncorrelated. He then ob¬ 
tained satisfactory results in illustrative examples. 

In 1929, C. C. Craig* obtained the semi-invariants of y/x in terms of moments 
of X and y, and then expressed the moments in terms of the semi-invariants of 
the distribution function, /(x, y), of x and y. By this means, he was able to 
deal with the case in which x and y are normally correlated under suitable 
conditions. Craig found it desirable to restrict the distribution of x in such a 
way that the probability of a zero value of x is an infinitesimal of sufficiently 
high order that a certain integral exists. This limitation seems to imply in 
applications to actual data that no zero values of x are to occur. This suggests 
that we deal with the cases of x at or near zero with considerable care. 

By starting with the assumption that the values of x and y are a set of 
normally distributed pairs of values with correlation coefficient r, and by con¬ 
sidering the quotient z = a and 6 being constants, R. C. Geary,* in a 

o -f- X 

paper published in 1930, found an algebraic function, u = /(z), of fairly simple 
form with the property that m is nearly normally distributed with arithmetic 
mean zero and standard deviation unity provided that o -1- x is unlikely to 


* On the constants of index distributions, Biometrika, Vol. 7 (1910), pp. 631-^46. 

* The frequency function of y/x. Annals of Mathematics, Vol. 30 (1928-29), pp. 471-486. 

* The frequency distribution of the quotient of two normal variates, J. Royal Statistical 
Society, Vol. XCIII (1930), pp. 442-7. 


145 



146 


H. L. RIETZ 


have negative v^ues. Here we have again a suggestion to exercise special 
care in the case of quotients with the divisor near zero or negative. 

In 1932, Fieller^ obtained in explicit form the approximate distribution of 
/ = 2//x where values (x, y) are drawn from the bivariate normal distribution 

1 1 1 f (a;—g)* (y—y)» ^ (aj-I)( y—y) ) 

-7=6 / 

27r<r» (Ty y/\ — 

under the condition that x is large compared with (Tx . 


r 



Very recently Kullback® found the distribution law of the quotient, t = y/x, 
where x and y are drawn from Pearson Type III parent populations given by 

/i(x) = ; My^ = » ogxgoo, ogj/goo. 

It is fairly easy to see, in a general way, that the distribution of t = y/x 
depends very much on the location of the origin as well as on the parent distri¬ 
bution from which x and y are drawn. This fact will be fairly obvious from the 
present paper whose main purpose is to give clear geometrical descriptions of 
the distributions of ratios, t = y/x, for each of several cases in which (x, y) 
are points taken at random from certain simple geometrical figures conveniently 
located with respect to the origin. 

In accord with the suggestions to be cautious when the divisor is near zero 
or negative, we consider first the very simple case of ratios t = y/x obtained 

* E. C. Fieller, The distribution of the index in a normal bivariate population, Bio- 
metrika, Vol. 24 (1932), pp. 428-440. 

* Solomon Kullback, Annals of Mathematical Statistics, Vol. VII (1036), pp. 51-63. 



ON FREQUENCY DISTRIBUTION OP CERTAIN RATIOS 


147 


from points uniformly distributed over a rectangle such as is shown in Fi^. 1 
with sides parallel to coordinate axes and ai > 0, > 0. As indicated on Fig. 1, 

we assume for simplicity that the coordinates of the points are positive and 
ai ^ X ^ bi ^ y ^ b2. 

Case I. When - g Fig. 1. 

Ol 02 


Let k dx dy be the probability that a point (x, y) taken at random in the 
rectangle will fall into dxdy where & is a constant. Then 

fbt rat 

k / dxdy = k((h — Oi)(&2 — bi) = 1 , 

Jbi Jai 


and 


k = 


1 _ 

(02 - ai){bt - 2>i)’ 


Transform the element k dxdy into one with variables ty and x by making 


X = X, 

y = tx. 


The Jacobian is | x | = x. 

The new element is fc x dxdt and is to be integrated over the range on x for 
an assigned t in order to get the probability, to within infinitesimals of higher 
order, that a random t falls into an assigned di. By assigning t any value such 

that — g < g , say t is the slope of MNy (Fig. 1), we have 

02 Ol 


(1) 


k j xdxdt = 


the limits of integration being indicated by the ends of the line MN, 

When the assigned t is such that — ^ i ^ , say t is the slope of the line 

Ol 02 

M^N' y we have 

(2) k j xdxdt = ^ (o? — o?) dt 

Ja\ 2 


When the assigned t is such that ~ ^ ^ ^ say it is the slope of 

Os Ol 

we have 


k 



xdx dt 


k(bj 



dt 


(3) 



148 


H. L. BDBTZ 


6i 

Thus, from (1), (2), (3), when cus in Fig. 1, — ^ , the frequency function 

of t is given by 

(4) when ^ ^ ^ 

(6) F(t) = ^ (a? — €l\) when ~ ^ ^ ~, 

2 Ui 03 

(6) «O = 5(j|-«0 


See Fig. 2 for the general form of the frequency curve F(t) when ~ < 

Oi 

with the segment from ^ ~ to - a horizontal straight line and with discon- 

Oi 02 

tinuities in the first derivatives of F(t) at f = - and f 

Oi 02 


FtfJ 



When Oi 0, and bi == 0, the frequency curve approaches 

(7) F(t) - ^ when 0 S ^ - 

ZU2 O 2 

(8) F(<) ■= when < ^ 

2o2^ Oi 

It may be noted that the curve given by making Oi = 0 and 6i = 0 extends 
to infinity, and that the first and second moments about the origin are each 
infinite. 


5*15^ 



ON FREQUENCY DISTRIBUTION OF CERTAIN RATIOS 


149 


Case II. When 

a\ 0% 


If the rectangle in Fig. 1 were moved upward keeping its sides parallel to the 
X and y axes until - > -, we would obtain 

ai Os 


(9) F«) = \ (a\ - if S < g 

2 \ / Oj oj 

(10) F(t) = ^(bl-bl) if 

2r^ df CLi 

(11) n<) = 5(^-a?) if 

2 \ r / tti Ui 


By comparing (5) and (10), it may be observed that F{t) of the middle seg¬ 
ment of the distribution curve differs much in Case II from its corresponding 
constant value in Case I. 

By moving the rectangle of Fig. 1 downward, keeping its sides parallel to the 
X and y axes until bi is negative, we easily find further forms of the distribution 
curve F(t). 

To consider the distribution of the ratio t = y/x for another very simple type 
of distribution of x and y, suppose we have given the distribution function 


( 12 ) 

where 



_ * _ V 

/(x, y) = ke^ ^ ^ 

fix, y) dxdy = 1. 


( X c > 0, y non-negative' 

a > c, f> > 0 

Then 



In this case, 

(13) 


__ Z __ X I 

X e ^ ^ dx 


-—( 0 + 

h -|- dt \ h -j- Oft 


ah 


) 


a monotone decreasing function from < = 0 to f = 00 . 
With c = 0 as a limiting value, we obtain 


(14) 


Fit) = 


ab 

(b + air 


a distribution curve with the mean value of t at infinity. 



150 


H. L. rnurfs 


K we should similarly consider 

yt 

(16) /(x, 2/) = —— e (x and y non-negative) 

TO'zd’p 

we easily obtain 


(16) 


F(t) 


2 



as the distribution function. 

Although the difficulties® of the problem of the distribution of the ratio y/x 
when X and y are normally correlated have been overcome^ to a considerable 


K 



O 


Fig. 3 


A 


extent, still the examination of some very simple geometric cases of non¬ 
normal but linear correlation may not be without some interest. • Such a case 
will now be considered. 

For one very simple case in which x and y are correlated, suppose we are 
given a set of points (x, y) uniformly distributed over the parallelogram ABCD 
(Fig. 3) with sides AD and BC parallel to the 2 /-axis so that the regression of 
2 / on X is linear as shown by the line RS, 

The equation of tiS is 

(17) y = mix - oi) + 


• Loc. cit., Pearson, p. 681. 

’ Loc. cit., C. C. Craig, R. C. Geary, E. C. Fieller. 


ON FREQUENCY DISTRIBUTION OF CERTAIN RATIOS 


151 


Then although Xi and y,- are correlated, and 

-a,)- 

are uncorrelated. Let Us consider the distribution of the ratio . 

Xi 

Consider the element of frequency kdxdy\ where 

(18) m - W (02 ~ oi) = 1. 

Change variables to x and t' by the transformation 

2 /' = t'x. 

Then the element of frequency becomes 

(19) kx dx dt'. 

Next integrate (19) with respect to x under the restriction that t' is assigned. 
Three cases occur: 


(a) When — ^ ^ t' ^ — -y we obtain by integration of (19) for the 

202 ^02 

element of relative frequency of t' in dt', 


( 20 ) 


/■' 


X dx dt' = ^(02 — o5) dt\ 

2 


(b) When t' ^ —-j we obtain 


202 


bt-bi 


(21) k zdxdt' - of] dt' 

(c) When g we similarly obtain 


2o2 

bi—bi 


(22) • k = 

From (18), (19), (20), (21) and (22), the frequency function of t' is given by 


( 24 ) 


F(t') - 1 [ (.h-b,)^ _ - 

2(6, - 6i) (a* - a,) L 4<'* ‘J’ 



152 


H. li. BIBTZ- 


where the range of t' is subject to either the inequalities, 


hi - 5x 
202 




bi-bi 
2oi ’ 


or 


bi-bi 

2ai 


b» - bi 
2oi 


See Fig. 4 for the general form of the F(t') frequency curve. 

If we make oi = 0, the curve becomes infinite in range. If we make not only 
oi = 0, but (bi -f bt)/2 — 0, we have, in place of (17), 


y = mx. 


In this limiting situation, if we make Oi = o and 



= b, 



(23) becomes 

(25) F{t') — for — - ^ and (24) becomes 

45 a a 

(26) F{t') = for ^ and for ^ 

Then we have y' = y — mx 

y' 

and t = — ^ t -- m, 

X 

s 

Further, if t' is distributed in accord with a frequency function, Fif,'), the 
distribution oi t = t' m with m constant is given by 


F(< - m). 



ON rREQUENCY DISTRIBUTION OF CERTAIN RATIOS 


153 


Hence, the probability that a random value t will fall into a range i to t dt 
is given to within infinitesimals of higher order by 

(27) ~ dt when m— + 

4o a a 

and by 

(28) - when ^ ^ m + - and t ^ m 

4a(< — my a a 

With the frequency curve given by (27) and (28) we may note that the variance 
of t becomes infinite. 

Without taking the space to continue illustrations, it is fairly obvious that a 
wide diversity of form can be given to the frequency function of the quotients 
t = y/x by relatively simple changes in the location of a sample parent popu¬ 
lation with reference to the origin. 



EDITORIAL 


THE FUNDAMENTAL NATURE AND PROOF OF SHEPPARD’S 

ADJUSTMENTS 

In the course of our discussion of moment adjustments, we shall have occasion 
to refer to the following lengthy distribution of discrete variates. By selecting 

TABLE 1 


Distribution of the number of items correctly recorded by 244 students in a Jive 

minute code transcription test* 


Score 

X 

Freq. 

f 

Score 

X 

Freq. 

f 

Score 

X 

Freq. 

f 

64 

1 

94 

3 

119 

1 

66' 

2 

95 

5 

120 

2 

68 

2 

96 

3 

121 

6 

69 

1 

97 

3 

122 

2 

70 

1 

98 

12 

123 

3 

71 

3 

99 

4 

124 

2 

72 

3 

100 

5 

125 

6 

73 

3 

101 

6 

126 

3 

76 

1 

102 

8 

127 

4 

77 

2 

103 

6 

128 

2 

78 

3 

104 

8 

130 

2 

79 

1 

105 

9 

131 

1 

80 

2 

106 

5 

132 

5 

82 

2 

107 

3 

133 

1 

83 

3 

108 

3 

134 

1 

84 

2 

109 

4 

136 

1 

85 

6 

no 

2 

138 

1 

86 

3 

111 

4 

140 

1 

87 

1 

112 

7 

141 

1 

88 

2 

113 

5 

142 

2 

89 

4 

114 

5 

144 

2 

90 

4 

115 

7 

153 

1 

91 

5 ’ 

116 

8 

155 

1 

92 

2 

117 

3 



93 

4 

118 

2 

Total 

244 


• I am indebted to Professor J. A. Gengerelli, of the Department of Psychology of 
Univ. of California at Los Angeles, for these data, 

154 



NATURE AND PROOF OP SHEPPARD^B ADJUSTMENTS 


155 


the provisional mean, Afo = 105, we find that 

Sx /= -129 2x3/= -52 005 

2 x2/ = 77 591 2 x^/ = 69 239 951. 

Let us now form the nine possible distributions of grouped-discrete variates 
that arise from the nine possible ^^groupings of nine.’^ These are presented 
in table 2. 

TABLE 2 

Distributions derived from the data of tabic 1 by making the nine possible 

groupings of nine'^ 


First significant class interval of distribution 


(1) 

64-72 

(2) 

63-71 

(3) 

62-70 

(4) 

61-69 

(5) 

60-68 

f6) 

59-67 

(7) 

58-66 

(8) 

57-65 

(9) 

56-64 

13 

10 

7 

6 

5 

3 

3 

1 

1 

12 

15 

16 

16 

14 

14 

13 

15 

15 

27 

23 

21 

20 

22 

21 

16 

14 

11 

41 

41 

33 

32 

30 

28 

31 

29 

30 

53 

54 

63 

61 

55 

52 

49 

45 

41 

45 

45 

40 

38 

42 

45 

44 

48 

52 

27 

27 

29 

34 

36 

39 

40 

42 

43 

16 

19 

24 

25 

23 

24 

28 

30 

29 

8 

6 

7 

6 

10 

10 

12 

11 

13 

1 

2 

2 

4 

5 

6 

6 

7 

7 

1 

2 

2 

2 

2 

2 

2 

2 

1 

1 


Let us now compute the values of 2x/, 2xy, 2x3/ 
distributions of table 2, selecting Mo = 105 in each instance in order to facilitate 
a comparison of these results with those for table 1. Thus, in spite of what 
would otherwise be called poor computing technique, we shall use the following 
class marks as values of x for the first distribution above; —37, —28, —19, • • • , 
35, 44, 53. For the second we shall likewise use, —38, —29, —20, • • • , 34, 43, 
52, respectively. 

TABLE 3 


Summations derived from the distributions listed in table 2, using Mo = 105 


Diet. 



.n 


(1) 

- 181 

77 149 

- 134 191 

69 063 265 

(2) 

- 218 

78 466 

- 54. 602 

74 519 962 

(3) 

- Ill 

77 769 

2 889 

71 465 409 




166 


EDITORIAL 


TABLE 3 —Continued 


Dist. 

Xxf 

Sx*/ 

Xjfif 

Xx*f 

(4) 

- 139 

79 

747 

- 23 

311 

74 

171 

443 

(6) 

- 104 

81 

934 

19 

666 

76 

143 

874 

(6) 

- 87 

80 

145 

16 

551 

72 

467 

541 

(7) 

- 52 

80 

302 

- 36 

118 

71 

851 

930 

(8) 

- 89 

78 

553 

- 101 

357 

68 

426 

497 

(9) 

- 180 

78 

894 

- 180 

792 

73 

155 

150 

Average 

- 129 

79 

2171 

- 54 

585 

72 

362 

785t 


The fact that the average of the values of Sj:/ appearing in table 3 suggests 
that no adjustments of the first moment is necessary and that the variations 
in the nine values for 2x/ may be regarded as acadental errors and attributed to 
grouping. An attempt to account for this phenomenon and also for the fact 
that the averages of the higher order summations of table 3 do not likewise agree 
with the corresponding summations of table 1 lead us directly to formulae for 
Sheppard^s adjustments. 

For the moment, let us concentrate our intention upon a single variate, Xo, 
and its associated frequency, /*^, that are a part of a distribution of discrete 
variates, such as table 1. Suppose we w^ere to form the k different distributions 
arising from the k possible ^^groupings of kJ^ In one of these distributions, 
Xo will rest in the first position of a class interval: the limits of this class are Xo 
and (a:o + A; — 1) and the class mark is therefore [xo + i{k — 1)]. The 
contribution of the variate, Xoj to Zx*f for this particular distribution is therefore 

[^0 + i{k — l)]**/xo. 

If Xo rests in the second position of a class, the limits of this class will be 
— 1) and (xo + fc — 2) and the corresponding class mark is [xo + ^(k — 3)] 
and the contribution of Xo to for this distribution is 

[Xq + — 3)]**/zo. 

The expected value of Sx*/ arising from the k different groupings of variates is 
therefore, 

(1) EiY. J*/) = ^ [s a-*/ + 2 a:*/ + • • • + IZ a'* /] 

% 

where x* / refers to that distribution in which a specified Xo rests in the i-th 
position in the class in whidi it occurs. The contribution of xo to this expected 
value is therefore 



NATXniB AND PROOF OF SHBFPABD’b ADJUSTUBNTS 


167 


(2) ^ {[*0 + i (fc — 1)1* + [®o + i (fc “ 3)]* + [sso + i (fc — 5)]* -1- • • • ] ft,, 

this series consisting obviously of k terms. 

Expanding each term of (2) by the binomial theorem yields 



- ,Ci xj ‘ ^ 

fc - 1 
2 

^ a:? * ^ 

k - ] 
2 

y - .c,xr* ( 

fc -1 
2 

■)+...] 


- .Cix;“* ^ 

jfc - 3 
2 

^ ^0 * ^ 

fc- a 
2 

'y-.c,xr*( 

fc - a 
2 

')•-] 

1 r . 

^ 1 / 

fc — 5 


fc - f 

>v ^ * / 

fc — f 

)\8 1 

fcL*® 

- ,Ci xj“‘ f 

2 

J + «Cj Xq y 

2 

j - Axr*^ 

2 



etc. 

Since «is an integer, series (2) may be written as the sum of the (s + 1) terms 
of the series 


(3) 1*2 -So - .Cl xr‘ Si + .C2 xr* St - .c, xr’ -S3 +•••]/*,, 

where 

(^) (^) ••• • 

By the Euler-Maclaurin Sum Formula we have 

X “ —r-T — 0*’+*) + i (b^ + a”) + ^.p (6*^* — 

- ^ pW (6»^* - o'--’) + ^ p(» - o'^‘) + • • • , 


where p<® = p(p — 1) (p — 2) (p — 3) • • • to i factors. In our expression for 
(Si, a = i (fc — 1) = — 6, and therefore equals zero when i is an odd integer. 
For even values of i, 



j (k - !)• (k + i) Bi (k - iV-’ 

\ 2*+* (1 + 0 ■^21 V 2 / 



( 4 ) 



168 


XDITOBUL. 


60 that 


50 = 1 

51 = ^ ik* - 1) 

S4 = ^ (A:* - 1) (3A» - 7) 

etc. 

Since expression (3) represents the contribution of any variate, Xo, to the 
expected value defined by (1), we may obtain by summation 

(5) E(£ x‘S) = E ^/ + .Cl • S, . E *-*/ + .C4 • S4 E + • • •. 

To illustrate: if we desire to shorten the distribution of table 1 by forming class 
intervals of dimension 9, 

Si = ^ (9* - 1) = ^ , ^4 = ^ (9» - 1) (3 • 9* - 7) = ^ , 

and by formula (5), 

EiZ^f) = E^r/= - 129 

^(E a:V) = E + iCi • Sj . E / = 77691 + ^ . 244 = 79217*« 

on 

J&(E ^*f) = Ea;V + »Ci • s, • E a;/ = - 62005 + 3 • y (- 129) = - 54585 

^(E ^f) = E + iCt • s, • E + 4C4 • S4 • E/ 

= 69239951 + 6 • ^ • 77591 + . 244 = 723627852« . 

O d 

Since these expected values are identical with those computed directly in table 3, 
we see that formula (5) provides the adjustments necessary to eliminate the 
effect of the systematic errors caused by grouping. 

Dividing both sides of (5) by S/ yields 

(6) £(m:) = M.' + A. 1 S 2 . m:~2 + .C 4 . s4m:-4 + .c ,. + •••, 

that IS 

= Ml 

E(/o) = M^ + (fc* - 1) 



NATUBK AND FBOOF OF SHKFFASD’s ADJUSTUENTS 


159 


= #*; + ^ (A* -1) Ml' 

Ei^\) = ^ - 1) z** + ^ “ 1) (3** - 7) 

EU) = + g (fc» - 1) Mi + ^ - 1) (3fc* - 7) Mi 

etc. 

In numerical computations we generally prefer to select the class interval 
as the unit of x and in this case we have 

E{yi[) = Ml 

^(m») = + a (^ " ^) 

^(Mi) = '‘i + (l - i^) '‘i 

E{t^\) = Mi + ^(l - ^)>*^ + ^(l - ^) (3 - I) 

etc., 

Ordinarily we are interested in estimating the values of the moments that 
would have been obtained if we had not used the time-saving device of grouping 
the variates and therefore we solve the previous set of equations for the moments 
of the ungrouped distribution and obtain 



[ etc. 

In general we may write, corresponding to formula (6), 

(8) m: = -b(m:) - .C2 • p*. p(m:«2) + .c, . p, . — 

where 




160 


BmTOBIAL 


Pi 


Pit = 


ro(‘-p)('-|) 

-(.-l)(l«4477 -“f5 + ^«- 


240 
1 

1344 

1 

11520 

1 

33792 

1 


5591040 


32410 


)fc« 


+ 


2625 




In actual problems we do not know the exact values of the expectations 
involved in formulae (7) and (8), and are forced to obtain mere approximations 
by utilizing in their stead the corresponding moments computed from the 
single chance grouped distribution. These approximations correspond to those 
employed in the theory of probable error, namely, substitutions of the moments 
derived from a single sample for the corresponding expected moments of the 
parent population. 

The adjustments so far considered may properly be referred to as Sheppard’s 
adjustments about a fixed point. At first thought it might appear that we might 
obtain corresponding formulae for the expectations of moments about the mean 
by merely dropping the primes in formula (6) and obtain, for example. 


« = Eius) - ^ - 1 ), 


but unfortunately this is not true. For example, the exact value for the variance 
of the distribution of table 1 is 18915563/244*. Using the summations of 
table 3 and computing the variance for each of the nine groupinp yields 


E(ut) 


; [18791595 + 19098180 + 18963315 + 19438947 


(9) 


9.244*' 

+ 19981080 + 19547811 + 19590984 + 19159011 + 19217736] 
= 19309851/244*. 


Since (fc* — 1) = ^ (9* — 1) = 20/3 we see that 


/US < jSCms) - ^ (fc* — 1). 

In the theory of sampling we differentiate between the standard errors of 
moments about a fixed point and the standard error of moments about the mean 



NATUBB AND PBOOf OF SHEPPABD’s ADJUSTMENTS 


161 


of the sample, .^parently writers on the subject of Sheppard’s adju^ments 
have overlooked the case of adjustments about the mean, although the solution 
for the second moment is readily obtcuned as follows: 

E(^ = E(n', - = E(/,) - E(M^) 

= M2 + j2 j (-^1 + + * * * + ^ 1 )) 

where Mi represents the mean of the f-th of the k different grouped distributions. 
Since 


P, = = m; - [(Ml + M, + ... + Af*), 

E(nt) = /ij + j2 (^* ■" 


rM? + 3/^+ • 

■■+Ml /M, + M, + • 

■ + M,\~\ 

k 

V k 

)i 


But since for any set of k variates 



we have that 

(10) £(m,) = M, + ^ (*^ - 1) - 

Referring back to table 3 we find that 

2 _ 7856 

3.(244=“) 

and the numerical results now satisfy equation (10). 

For the benefit of those interested in unsolved problems of mathematical 
statistics we may say that nothing appears to have been written as yet on the 
most important problem associated with the systematic errors due to grouping. 
It is of course desirable to eliminate these systematic errors introduced by 
grouping, but it is even more important to investigate the. distribution of the 
accidental errors that remain after the systematic errors have been eliminated. 
For example it is gratifying to know that no systematic errors are present in the 
Sx/ colunm of table 3 and that equation (6) will enable us to add a constant to 
each summation of the 2xy column so that the mean of these adjusted values 
will agree with the value Sx®/ = —52005 obtained in table 1. It is rather dis¬ 
concerting, however, to realise that in actual practice we may in the case of 
discrete variates and must in the case of continuous variates select an arbitrary 
set of class limits for our recorded data, and that after adjustments for grouping 



162 


EDITORIAL 


have been made, our estimates of the true values of the moments of the distri¬ 
bution will—as in table 3—depend so much upon the choice of these limits. 
Thus, the standard error of the mean attributed to grouping is 



which is about twenty percent as large as the approximation for the standard 
error of the mean due to sampling from an infinite parent population, namely, 

/T 

If one will take the trouble to compute the values of /is and )U 4 for each of the 
distributions of table 2, utilizing the summations of table 3, and then compute 
and compare the values of o-,,, and due to grouping with the corresponding 
functions associated with sampling, he will realize the seriousness of the situation. 

Summary 

The formula for Sheppard^s adjustments for distributions of grouped discrete 
variates was first given without proof in the Editorial of Vol. 1, No. 1 of the 
Annals (page 111). The method used to develop the general formula was 
extremely laborious and paralleled the method used for the case of continuous 
variates in the Handbook of Mathematical Statistics^ Chapter 7, except that the 
calculus of finite differences was employed. A more satisfactory proof of this 
formula was presented by Dr. J. R. Abemethy in Vol. 4, No. 4 of the Annals 
in an article entitled *^On the Elimination of Systematic Errors Due to Grouping ” 
An extremely elegant development of the same formula and an extension to the 
case of two variables appears elsewhere in this volume by Professor C. C. Craig. 
From the point of view of expectations, all of these developments are adjust¬ 
ments about a fixed point, although this fixed point may be selected arbitrarily 
at the mean of the distribution in question. The obtaining of formulae for the 
adjustments about the mean of each grouping and the distribution of the 
accidental errors that remain after these systematic errors have been removed 
has apparently been neglected to date and should interest students of mathe¬ 
matical statistics. 

From a mathematical standpoint, the development of this paper is the 
simplest of all that have appeared to date: the adjustments for the first four 
moments can be worked out with the aid of the binomial considerations leading 
to formula (3) and the following well known formulae for the sums of the powers 
of the first n integers: 

5, _ n(n + 1) s, _ nKn + 1)® 

*1 -^— b, - 4 - 

„ n(n + l)(2n +1) „ n(n + l)(2n + l)(3n* + 3n — 1) 

6 ' 30 ■ 



NATURE AND PROOF OF SHEPPARD'S ADJUSTMENTS 


163 


One should note that the condition of high contact is not required in this 
paper or in the developments of Abemethy or Craig. The results of the three 
preceding papers agree with those obtained about a fixed point in this paper, 
but fail to hold for the case of expectations about the mean, if we accept the 
following definition: 


(m«: 1 + Mill + * * * + > (5 — 2, 3, • • • ) 

where m#:* designates the «-th moment computed about the mean of the f-th 
grouped distribution, (1 ^ t ^ fc). 

H. C. Carver. 




ON A GENERAL SOLUTION FOR THE PARAMETERS OF ANY 
FUNCTION WITH APPLICATION TO THE THEORY OF 
ORGANIC GROWTH 

By Harry Sylvester Will 

Part I 

I. The Problem Stated. A type of problem which continually arises in the 
ordinary course of statistical analysis is that of determining the numerical values 
of the parameters of a function used to represent a series of observational data. 
In mathematical terminology, the problem may be stated as follows: 

Given, the observational series Yo, Yi, • * * Yn-i. 

Assumed, the function y = /(r, a, 5, c, • • • )• 

To find, the numerical values of the parameters a^by Cy • ' • . 

If the function/(x, a, b, c, • • • ) is linear in the parameters, the desired solution 
is easily obtained by familiar methods. In cases where the function is not 
linear, the standard procedure is to reduce it to the linear form by expansion 
into Taylor^s series, thus: 

f{Xy a, by c) = /(x, ao6oCo) + /a(x, ao6oro)*Aa + /^(x, aoboCo)-A6 

( 1 ) 

+ /c(a:, Oo?>oCo)-Ac, 

where a = Uo + Aa, 6 = 6 o + A 6 , c = Co + Ac. 

The use of this method suffers from the excessive labor involved as the number 
of parameters to be determined increases. In cases where satisfactory values 
of the first approximations aoboCo are not obtainable, the solution becomes im¬ 
possible. The basic diflSculty arises from the consideration that the Taylor 
theorem requires that the increments Aa, A 6 , Ac shall be very small quantities. 

A method of successive approximation which makes feasible the reduction of 
gross errors in the corrections will, I take it, be of considerable interest to 
mathematical statisticians. Let us, therefore, proceed to the development 
of a technique which accomplishes precisely this result. 

II. The Theta Technique. Let us begin our development with the follow^- 
ing restatement of the technical problem involved: 

Given, the observational series Fo, Fi, • • • F„_i. 

Assumed, the function y = /(x, (ao + ^lAa), (6o + ^2A2)), (co + ^sAc)). 

To find, the values of ^ 2 , 

In this set of relations, ao, 60 , Co and Aa, Afe, Ac are known quantities; w^hile 
^ 1 , and Bz are each assumed not to exceed ±1 in value. It follows, therefore, 

165 



166 


HARRY SYLVESTER WILL 


that the adjusted values of a, 6 , and c lie within the bounds ao zb Aa, bo ± A 6 , 
Co dz Ac. We may, then, write the following: 

di = Cq — Ad] (i 2 == flo “1“ A<z. 

6i == 6o — A6^ 62 = 6 o 4 ” A6. (2) 

Cl = Co — Ac; Ca = Cto + Ac. 

The values of 61 , 62 and 63 are determined by the following procedure: 

First, form the function y from all possible combinations of OiOa, 61 ^ 2 , CiCa, 
thus: 

2/111 = f{x, aibiCi). 

2/112 “ f^x, Ui^iCa). /QN 


2/222 = /(^, U2?)2C2). 

In the case of p parameters, we can evidently form 2 ^ distinct sets of n values 
for the function ym. Since the assigned values of parameters are mere approxi¬ 
mations to their true values, each computed set of values for the function ym 
will differ from the true values y = f{x, abc). 

Second, form the theoretical residuals ym — y, and then compute the corre¬ 
sponding standard errors of estimate (rm- There will, accordingly, be 2 ^ values 
of <T determined, each value being a measure of the error committed in assuming 
the corresponding approximations to parameters; thus, <riii measures the errors 
committed in assuming the combination dibiCi ; crn 2 measures the errors com¬ 
mitted in assuming aibiC 2 ; * • • ; 0-222 measures the errors committed in assuming 
(ijb ^2 • 

Third, taking the squared reciprocal of <7 as a measure of the reliability of a 
given determination of ym from the parameters dibiCi, we may form the follow¬ 
ing comparative tests of the reliability of the 2 ^ sets of the values of ymy thus: 

Will = 0-111: v (7 111 + 0-112 + ••• +0-222) “ 0-111: 2 ^ a i . 

__ “2 .V' -2 
W112 — 0-1 1 2.i t • 

-2 -2 
W222 — <^222 • ^X a • 

Omega, we shall term the test constdnL Obviously, Swiii = 1. 

Fourth, assuming three parameters, let us tabulate the possible subscripts of 
omega according to the following scheme: 


to(Ol) 


a{bi) 

«(6j) 

w(ci) 

ai(ci) 

111 

211 

111 

121 

111 

112 

121 

- 221 

211 

221 

211 

212 

112 

212 

112 

122 

121 

122 

122 

222 

212 

222 

221 

222 





ON GENERAL SOLUTION FOR PARAMETERS OP ANY FUNCTION 


167 


In this table, the subscripts are in the order of ahc] so that 111 denotes 
co(ai 6 iCi); 112 denotes a;(ai 6 iC 2 ); etc. Comparing columns a)(ai) and ^(aa), we 
observe that the he subscripts are identical for both; while the ai subscripts of 
the first column are replaced by the Oa subscripts in the second column. Again, 
comparing columns a)( 6 i) and we see that the ac subscripts are identical 

for both; while the hi subscripts of the one column are replaced by the 62 sub¬ 
scripts in the other. Finally, comparing columns w(ci) and a)(c 2 ), we note that 
the ab scripts are identical for both; while the Ci subscripts of the one column 
are replaced by C 2 subscripts in the other. 

Fifthy let us form the column summations 2 a)(ai), Sa;(a 2 ); 2 co( 6 i), 2 a;(fe 2 ); and 
2 w(ci), 2 a)(c 2 ). Since the columns w(ai) and o)(a 2 ) differ only with respect to the 
a subscripts, the difference in value between the sums 2 co(ai) and 2 co(ai) can be 
due to differences in value between ai and only, and are not at all affected 
by differences in value between bihz and C 1 C 2 . 2 a>(ai) and 2 w(a 2 ) may, therefore, 

be regarded as the weights of ai and to be used in determining the adjusted 
value of a; for 2co(ai) + 20 ^( 02 ) = 1. 

We may, then, write the following relations: 

CL = 2co(cii)‘(ii -|- 2co((i2)‘U2 = 2a)((Zi) • (uo — Aa) -f- 2co(a2) * (uo 4“ Aa) 

( 5 ) 

= (2w(ai) + 2a)(a2))*ao + (2w(a2) — 2a)(ai))*Aa = ao -f d(a)*Aa. 

Since precisely similar reasoning applies to the parameters bi , 62 and Ci, C 2 , 
we have the following definitive formulas for computing the values of theta: 

^(a) = 2 w(a 2 ) — 2w(ai). 

e { h ) = 2w( 62) - 2o;(6i). ( 6 ) 

d{c) = 2aj(c2) — 2aj(ci). 

As the adjusted values of parameters, we have: 

a = ao + ^(a) • Aa. 

6 = 60 4 “ d{b)'Ab, ( 7 ) 

c = Co 4 “ d(c)'Ac. 

In this development of the theta technique, we have determined Ctt* from the 
theoretical residuals 2 /..» — V- This has served well the purposes of exposition; 
but, since the true values of the function y are unknown, we must, in practice, 
compute O',*, from the observational residuals — Y. Later in the memoir, 
it will be shown how the computation of 9 may, in numerous cases, be con¬ 
siderably abridged. 

Part n 

III. The Principle of Malthus. Since a determination of the numerical 
parameters of a given function by means of the theta technique must, at best. 



168 


HARRY SYLVESTER WILL 


involve a considerable amount of computation, I have chosen for purposes of 
demonstration a problem which is of much interest in itself. This problem, we 
shall state in the form of two questions: 

First, what is the most appropriate mathematical form of the law of organic 
growth? 

Second, how may the parameters of the indicated function be computed? 

Thomas R. Malthus, in his famous essay on The Principle of Population Growth 
assumed that the proportional growth of human populations is properly defined 
by the differential equation, 

= «> 

where p is the population under consideration, t is the measure of time, and h is 
the stable or geometric rate of growth. 

This formula has been destructively criticised on the ground that it fails 
wholly to give a mathematical description of the manner in which population 
growth is kept within bounds. So far as any implication of the formula is 
concerned, populations may grow to infinite magnitudes. An attempt to 
represent growth by its use must, therefore, result in a succession of discontinui¬ 
ties which are incompatible with the observed facts of organic growth. 

IV. The Symmetric Logistic. In three memoirs published in 1838, 1845 
and 1847, it was suggested by M. Verhulst, Professor of Mathematics in the 
Ecole Militaire in Brussels, that the rate of population growth might be stated 
as a function of the population itself. Assuming the limiting value of p to be 
Hj this conception of the growth rate Verhulst expressed by the differential 
equation, 

1 . ^ - 6(1 - pF-‘). (9) 

p at 

Since this equation expresses proportional growth as a linear function of p, 
it is the simplest relation of its kind that may be conceived. In representing the 
rate of growth as a quantity which approaches zero as the population approaches 
its limiting value, it makes, indeed, a significant advance over the Malthusian 
formula. Nevertheless, the equation is subject to an interesting limitation, 
the nature of which is made evident by an examination of the integral form of the 
function, namely: 

p = H:[\ -}- (10) 

This we shall now prove to be rotationally symmetric with respect to the point 
of inflection. 

Differentiating equation (9) a second time, we have, 

—h dp[p{\ — H^^p)]dt 

= plp-'Hp^ — h dPt + hH'-^p cPt + bH~'^dp dt] 

= p^^dp^ + hH-^p dp dt. 



ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 


169 


Hence, 

^ = 6*p(i - h-^pY - m-w - H-^p). 

Setting ^ = 0, we get, 

1 - 2H-'p = 0. 

Or 

p = H/2, (11) 

which gives the value of p at the point of inflection. 

Substituting for p from (10), and solving for t, we have, 

ti = -a/6, (12) 

where ti is the point of inflection of the function p. 

Denoting the magnitude of the population at time ti by pi, its magnitude at 
time ti+k by pv+jt, and its magnitude at the time ti^k by Pi^k, we have, 

Pi ^ H:[l + = h/2. (13) 

Pi^k = H:[l + = H:[l + c*»*^*]. (14) 

Pi^k = H:[l + = Hill + (15) 

Measuring p in units of // and setting u = wt may rewrite these last 
three equations as follows: 

H-^Pi = 1 / 2 . 

Il-^Pi^k = 1:[1 + a]. 

Il-^Pi-k =!:[!+ u-^]. 

On the hypothesis of rotational symmetry, we have, by subtraction, 

H-^Pi^k ~ 1/2 = 1/2 - H-^pi^k. 

In proof, we have: 

l;[l + r] = 1 - 1:11 + u-^] 

= i^“M[l + 

= 1 :[it + ]]. 

q. e. d. 


Part m 

V. Criticisms of the Logistic. Because of its symmetric form, many critics 
have called into question the finality of the logistic as a universal repre- 



170 


HARRY SYLVESTER -WILL 


eentation of population growth. That it applies in particular cases, they con¬ 
tend, is no reason for holding that it must apply in general. Professors Raymond 
Pearl and Lowell J. Reed of Johns Hopkins University—to whom we are in¬ 
debted for the rediscovery of the earlier researches of Verhulst—have proposed, 
as the proper form of the generalized growth curve, the following function: 

p = H:[l + (16) 

In their view, this equation is suited not only to representing a single cycle of 
growth, but two successive cycles as well. This claim, however, must be 
rejected; for, if true, it would mean that one cycle of growth is predictable from 
another, a circumstance which is clearly inconsistent with the assumptions laid 
down by these same investigators. 

Moreover, so far as I can learn from their published writings, these authors 
have never considered the implications of the differential form of the function 
they propose. 

Differentiating (16), we have, 

1 . ^ _ (6 + 2ca- + Zdx^) (1 - H-'p). 

p at 

Here, we find the stable growth constant of Malthus replaced by an expression 
which is quadratic in /. This means that, for a population which is freed of a 
restraining limit, proportional growth tends generally toward infinite values. 
If there are any facts to support such a conception of organic growth, I do not 
know what they are, and must, perforce, reject the contention that equation 
(16) is the generalized form of the Verhulst function. 

VI. Fundamental Assumptions. In order to represent the phenomenon 
of population growth mathematically, I hold the following assumptions to be 
necessary: 

(a) Under favoring conditions, population may increase at a constant geo¬ 
metric rate. 

(b) Under all circumstances, the rate of growth must be a finite and continuous 
quantity. 

(c) The magnitude of a population is always a positive, real number. 

(d) The growth of population tends toward restriction within definite bounds. 

(e) The growth of population is a function of time. 

(f) The basic conditions of growth are free of cataclysmic disturbances. 

The first of these assumptions is given in recognition of well known facts 

concerning organic gjrowth. The second is necessary because, even when the 
size of a population is freed of definite restriction, the pattern of growth is not 
necessarily geometric. The third assumption aflBrms the absurdity of represent¬ 
ing a population as jel negative or infinite quantity. The fourth merely asserts 
the indisputable fact that the organism must always grow in a finite environment. 
The fifth gives place to the concept of growth as the resultant of a complex of 



ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 171 

causes, no one of which can be isolated as an entirely independent variable. 
While the final assumption recognizes that major disturbing influences may 
profoundly affect the course of growth. 


VII. The Skew Logistic. In accord with our fundamental assumptions, we 
may form the following differential equations: 


j. 

P' 


dp' 

dt 


— [6 + • cos {m{t + g))] [1 — 


Type a 


= - [6 + wa + g): (1 + mKt + qm [1 - H-Y] Type /3 (17) 

= - [6 + 8m^{t +^) : Vl + m^(t + g)*] [1 - H-^p'] Type y 

In these equations, p^ = p — L, and measures p from its lower limit as origin. 
On separating variables, the following integrations may be performed: 

- I [dp':ip'{l - = - log[p':(l - H-'p')] = log [(H - p'):iHp% 

Writing z = m(t + g), dz = mdt; so that we have: 

b j dt + s j cos z dz ^ A + bt + 8 • sin z. 

b J dt + 28 J [ 2:(1 + z^)]dz = A + bt + 8 - log {I + z^). 

b j dt + 8 j [z: \/l z^]dz = A + bt + 8 \/l + z^. 


From these integrals, we form the following equations: 

log [(if - p'): (ifp')] = + 5 • sin [m(t + g)]. 

log [(if - p'):(ifp')] = + 6^ + s. log[l + + g)*]. 

log [(if - p'):(ifp')] = A + bt + 8- Vl + rn\t + g)*. 

We have, finally, on taking antilogaritlims and making the substitutions 
p = p' + L, a = — logif: 

p = L + if :[1 + ga+6«+» »in(m(<+ff))] Type a 

p = L + ii:[l + Type ^ (jg) 

p = L + if :[1 + Type 7 

These equations give the normal forms of the skew logistic. 


VIII. Properties of the Skew Logistic. We may deduce the properties of 
the skew logistic by examining both its differential and integral forms. Con¬ 
sidering the derivative of Type a, we note that the Malthusian constant b is 



172 


HARRY SYLVESTER WILL 


replaced by a trigonometric function whose amplitude is 6 ± and whose 
phase depends on the values of m and q. When 6 =t: m = 0 , the derivative 
must also equal zero, and a flat point in the curve of p is indicated. When b is 
absolutely less than sm, the derivative changes sign and the curve of p reverses 
its direction. Thus, the integral form of Type a modifies the symmetric form 
of the logistic by a succession of minor cycles in which the rate of growth is 
alternately accelerated and retarded. 

Considering Type /3, we find the Malthusian constant replaced by a function 
whose maximum and minimum values are attained when t = — q. Obvi¬ 

ously, therefore, this function passes through a single period whose amplitude 
mb it snif and whose phases are 6 , 6 + strij h, b — sm, h. When 6 db sm = 0 , 
a flat point in the curve of p is generated. The effect of skewness on the rate of 
growth passes through two double phases. Where b and s are of the same sign, 
these phases are: first, increasing retardation followed by decreasing retardation 
when i + qm negative; and, second, increasing acceleration followed by decreas¬ 
ing acceleration when t + q m positive. Where b and s are of opposite sign, 
the corresponding phases are: first, increasing acceleration followed by decreas¬ 
ing acceleration when ^ ^ is negative; and, second, increasing retardation 

followed by decreasing retardation when t + q m positive. It is to be noted 
that, when sm is absolutely greater than 6 , the derivative will change sign twice 
before the upper limit is reached. Under these circumstances, the function p 
passes through a double reversal of direction. 

Considering Type 7, we find the Malthusian constant of the derivative re¬ 
placed by a function which is aperiodic and which approaches the limits 6 db sm 
as t approaches <». When b and s are of the same sign, skewness passes through 
the two following phases: first, the phase of decreasing retardation when t + q 
is negative; and, second, the phase of increasing acceleration-when t + q is 
positive. On the other hand, when b and s are of opposite sign, the correspond¬ 
ing phases are: first, that of decreasing acceleration when < + g is negative; and, 
second, that of increasing retardation when < + 7 is positive. When sm is 
absolutely greater than b, the derivative changes sign, and the function p passes 
from a continuously increasing phase to a continuously decreasing phase, or 
vice versa. 

In general, it may be said of all three types— a, P and 7 —that, if the derivative 
is not restricted to a single change of sign, L denotes a lower asymptote of the 
function p\ while, under the same conditions, H denotes the higher limit ap¬ 
proached by the function p — L, When H is negative, the effect is to make L 
an upper, and L — H sl lower, asymptote of the curve p. 

In the case of Type* 7, when the function p makes a single change of sign, 
either H or L becomes a maximum (or minimum) value instead of an asymptote 
of the curve. In this event, it will be noted that the factor 1 — H'-^p appearing 
in the derivative does not approach zero as a limit with increasing values of t, 
but rather passes through a minimum and then approaches the limit 1 in either 
direction. 



ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 


173 


The parameter 8 may be positive or negative in sign, and is termed the index 
of skewness or, briefly, the skewness of the function. Obviously, m is always 
positive, and, since it determines the rate at which skewness develops, is properly 
termed the development. The point in time at which skewness passes from an 
accelerating to a retarding phase, or vice versa, is fixed by the value of q, which is, 
therefore, termed the transition. The parameter 6 , as has already been stated, 
is termed the stable growth tendency or, technically, the stability of the function. 
And since the position of the curve p on an arbitrary time scale will vary with 
the value of a, this parameter I have designated the location. 

In all three types of the skew logistic, if is a continuously decreasing func¬ 
tion and both H and L are positive, the curve of p may be described as of the 
rising hillside form. In the case of Type 7 , if the derivative changes from 
positive to negative sign, the curve may be described as mountain formed. If 
(> 4 ^ 0 ) increases continuously, the curve is of the falling hillside variety, except 
when the derivative of Type 7 changes from negative to positive sign, in which 
event a valley form is generated. 


Partly 

IX. Parameters of the S 3 rmmetric Logistic. The numerical parameters of the 
symmetric logistic (10) are most easily determined by the method of differences. 
First, we write, 

p-^ = C + ( 19 ) 

where C = A — a — log //; and ? = 0, 1, 2, • • • m — 1. 

Assuming M constant, let us give to t the increment kAt, thus: 

= C + (20) 

Subtracting (19) from (20), we obtain 

_ e-44 6^ = (21) 

where B = — 1. The quantity A^pi^ = is termed a first order 

difference of rank k. 

Giving to t in equation (21) the increment kAt, we get 

i^kVlXk = (22) 

Dividing (22) by (21), wc have, 

^kVllk'^kvT = 

Taking logarithms, we obtain 

A* log AjtP7‘ = log A*p 741. - log A*p 71 = hkAt, 
which defines the parameter h. We can form n — 2k such equations. Hence, 



174 


HARRY SYLVESTER -WILL 


b is uniquely determined by the relation 

b A* log AtP-‘]: [k(n - 2k)At] 

= [Elog A,P7‘ - E:=r"‘' log A*P7‘] : [^(n + 2fc)A<], (23) 

where k = n:3 to the nearest integer. 

Returning to (21), we have the following relation determining the value of A : 
A = log E A* P7‘] - iog[pE<::r*~‘e»'] 

= log [E::r‘P7‘ - E:=r*-‘P7'] - log [b ( 24 ) 

where k = n: 2 to the nearest integer. 

From equation (19), we have 

c = [E<-r‘^’ 7 ‘ - e'* E:-r‘f'']:«- (25) 

The values of H and a are, obviously, given by 

H = (26) 

a = 4 + log i/. (27) 

In the relations defining 6, A and C, the values of P must be obtained from 
the observations. In computing the values of fc, the formula is: 

k = n(r + 1 )“^, 

where n is the number of observ^ations, and r denotes the order of reduction 
involved in the defining relation. 

In my first treatment of the subject, I assumed that the value of k for all 
orders of reduction might be determined from the reduction of highest order 
involved; but I have since found that I erred in this view. The point is that the 
function ^(p) = k^{n — rA;), discussed in the original memoir, must be maxi¬ 
mized with respect to k separately for each order of difference involved; or, in 
other words, the rank constant k must be given a separate determination for 
each parameter defined if the most accurate results are to be obtained. 

X. Parameters of the Skew Logistic. I shall now show how the method 
of differences may be used to abridge the computations involved in applying 
the theta technique to the determination of the parameters of the skew logistic. 
In this, as in the preceding section, we assume At constant. 

Operating on Type 7 of equation (18), we write 

p, == L + H:[l + (28) 

To begin with, let us write the transformation of ordinate 

. G = log lH{p - L)-> - 1 ]. 

Also, let us write 

F = \/l + 7n^{t -f- qy . 



ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 


175 


We may now rewrite equation (28) in the form 

Gi = a -f- 6^ + sFt. (29) 

Giving to t the increment kAt, we have 

Gi+k *= a + b(t + kAt) + sFi^k* (30) 

Subtracting (29) from (30), we have, 

AkGi = bkAt + sAkFi, (31) 

Again giving to t the increment kAt, we obtain 

AkGi+k = bkA(t + kAt) + sAkFi^k. (32) 

Subtracting (31) from (32), we obtain 

AkGi^k — AkGi = (bkAt —• bkAt) + 8 (AkFi+k — AkF,), 


or 

AlG, = 8 AlF,. (33) 

We can form n — 2 k such equations, and may, therefore, form n — 2 k ap¬ 
proximations to the value of the parameter s, as follows: 

8.- = [AlG,]:lAlF,]-, i = 0 ,l,--,n- 2 k-l. 

Taking the mean value of the set s, as its most probable value, we have, 

8 o(HL- 7 nq) = Z5<:(n — 2k); k = n:3 to the nearest integer (34) 

In this determination of 5o, the only parameters directly involved are H, L, m 
and q, the parameters a and b having been eliminated. By assigning values to 
Ho, Lo, mo and ^o, we may, on setting up’the arbitrary corrections AH, AL, Am 
and Aq, write down the following: 

Hi = Ho — AH; H2 = Ho AH; Li = Lo — L2 — Lq -f- AL; 

mi = mo — Am; m2 = mo + Am; qi = — Aq; 92 = Q'o + Aq. 

Since So is a function of H, L, m and q, we may, by entering the subscripts of 
the combination HL rnq, tabulate the possible determinations of So as follows: 


1111 

11 12 

11-21 

11-22 

12 11 

12 12 

12-21 

12-22 

2111 

21 12 

21-21 

21-22 

2211 

22 12 

22-21 

22-22 


In this tabulation, the subscripts of parameters are in the orc^er of HL-mq; 
so that 12*21 denotes 8 o(HiL 2 ^m 2 qi), etc. 

From the table, it is seen that we may compute 2^ = 16 distinct sets of approxi¬ 
mations to 8 o(HL-mq). Since the true values of H, L, m and q are unknown, 
each set of approximations Si will show a characteristic variation about its mean 



176 


HAHRY SYLVESTER WILL 


value, So. This variation is most conveniently measured by the mean deviation 

6 = (So - 8o)2N^:N = (So - So)2iV":iV, (35) 

where the second relation serves as a check on the computation by the first; 
N = n — 2k] N' denotes the number of items s< which are less than so in value, 
and iV", the number of items s» which are greater than So in value; while Sq denotes 
the mean of the N' values of s,- which are less than So, and Sq , the mean of the 
iV" values of s* which are greater than so. 

The reliability of a given value of so as a measure of the central tendency of 
the corresponding set s, is sufficiently determined by which serves at the 
same time to measure the reliability of the combination HLmq figuring in the 
computation of the given set s* . We may, therefore, compute the values of the 
test constant, w, directly from the values of by means of the relation, 

G}{HL-niq) = + €1112 + • * • + 622 22 ] “ “, (36) 

wherej = IMl, 1112, • , 22 - 22 ; So) = 1 . 

Since four values of theta are to be determined, we must arrange the sixteen 
values of omega in four ways, as shown by the following tabulation of subscripts: 




«( L ,) 


w ( wj ) 

w ( mi ) 

«(? i ) 

a ){ gi ) 

11 

•11 

2111 

11 . 

11 

12-11 

11 - 

11 

11-21 

11-11 

ll - 

12 

11 

12 

2112 

11 

12 

12-12 

11 

12 

11-22 

11-21 

11 

22 

11 

■21 

21-21 

11 

21 

12-21 

12 

11 

12-21 

12 11 

12 

■12 

11 

22 

21 22 

11 

22 

12-22 

12 

12 

12 22 

12-21 

12 

22 

12 

11 

2211 

21 

11 

22-11 

21 

11 

21-21 

21-11 

21 

■12 

12 

•12 

22 12 

21 

12 

22-12 

21 

12 

21-22 

21-21 

21 

■22 

12 

21 

22-21 

21 

21 

22-21 

22 

11 

22-21 

22-11 

22 

•12 

12 

■22 

22-22 

21 

22 

22-22 

22 

12 

22-22 

22-21 

22 

•22 


Knowing the values of omega, we have at once. 


e{H) = ScoCffj) - Sw(ffi); 

B{1j) — Sci>(Z/8) — 2ai(Z/i); 


8{m) — 2w(m2) — 2 «(wi); 

B{(^ = Sca>(^2) — 2w(^i). 

(37) 

H = Ho + 

1j = Lq -j- B{IJ)'AL] 


m = mo + d{m)-Am; 

9 = 9o + e{q) ■^q. 

(38) 


The process of adjustment should be repeated until errors in the parameters 
diminish to negligible proportions. 

With Hy Ly m and q known to a sufficient approximation, we may form anew 
the functions G{Hy L, m, q) and F(Hy L, m, q). We can then write n — 2 fc 
equations 0 / form (33), viz.: 

AlCu^sAlF,, 



ON GENERAL SOLUTION FOR PARAMETERS OP ANY FUNCTION 


177 


Summing these equations, we have, 

slAiFi, (39) 

where IaIG, = - 2'E\Zl-'‘-^G, + j:\Zr^^Gr, 

+ "Zizr^-^Ft ; 

where k = n:3 to the nearest integer. 

The approximate value of « is now obtained from the relation 

s = (4o) 

Returning to equation (31), we solve for bkAtj obtaining, 

bkAt = AkGi — sAicFi, 

Since we can form n — k such equations, the approximate value of b is given 
by the relation 

b = - s2A;cfJ:[fc(n - k)At] (41) 

= [(Z::r‘G. - Z‘.=r‘“'G.) - - m], 

where k = n:2 to the nearest integer. 

From equation (29), we obtain the approximate value of a as follows: 

a = ( 42 ) 

Comparing the abridged method of computing the values of theta here out¬ 
lined with the general procedure of section II, it will be seen that we have been 
able to reduce the number of values of omega which it is necessary to determine 
from 2^ = 128 to 2^ == 16. In cases where L may be assumed to equal zero, 
the number of values of omega which must be computed is further reduced to 
23 = 8. 

Party 

XI. Sjrmmetric Parameters for the Population of the United States. I 

have determined the numerical values of the parameters of both the sym¬ 
metric and the skew forms of the logistic from the population figures for the 
United States given by the Bureau of the Census. The only departure in the 
data from the census figures consists in the interpolation of all items to June 
1st as the date of observation. The values of the symmetric parameters are 
computed from the data of Table I, as follows. 

Setting fc = 15 3 = 5, we have, by equation (23), 

A» log AsPT' = log AjPT^ - 23J log AjP7‘ 

= 9^.71878n - 5.14555n = -3.42677. 



178 


HABHT SYLVESTER WILL 


TABLE I 


Data for the Symmetric Logistic 


i 




10“ 

0 

0.26582 

-0.19724 

i.29500„ 

1.00000 

1 

0.18939 

-0.14627 

i.l6516„ 

0.72934 

2 

0.13885 

-0.10704 

i.02955„ 

0.53193 

3 

0.10431 

-0.07838 

2.89421„ 

0.38796 

4 

0.07770 

-0.05776 

2.76163„ 

0.28295 

5 

0.05858 

-0.04269 

2.63033„ 

0.20637 

6 

0.04312 

-0.02996 

2.47654„ 

0.15051 

7 

0.03181 

-0.02098 

2.32181„ 

0.10978 

8 

0.02593 

-0.01650 

2.21748„ 

0.08006 

9 

0.01994 

-0.01182 

2.07262„ 

0.05839 

10 

0.01589 



0.04259 

11 

0.01316. 



0.03106 

12 

0.01083 



0.02265 

13 

0.00943 



0.01652 

14 

0.00812 



0.01205 

S 

1.00288 

-0.70864 

ii.86433„ 

3.66216 







TABLE 11(a) 







Data for the Skew Logistic 



i 


Gx 


(h 

^”11 

Fa 

Fji 

Fa 

0 

+ 

1.67998 

+ 

1.71132 

6.47765 

3.35261 

9.65194 

4.90306 

1 

+ 

1.54968 

+ 

1.57779 

5.68859 

2.60000 

8.45931 

3.73631 

2 

+ 

1.40690 

+ 

1.43878 

4.90306 

1.88680 

7.26911 

2.60000 

3 

+ 

1.27698 

+ 

1.30927 

4.12311 

1.28062 

6.08276 

1.56205 

4 

+ 

1.14130 

+ 

1.17416 

3.35261 

1.00000 

4.90306 

1.00000 

5 

+ 

1.00816 

+ 

1.04179 

2.60000 

1.28062 

3.73631 

1.56205 

6 

+ 

0.85948 

+ 

0.89428 

1.88680 

1.88680 

2.60000 

2.60000 

7 

+ 

0.70540 

+ 

0.74193 

1.28062 

2.60000 

1.56205 

3.73631 

8 

+ 

0.59699 

+ 

0.63515 

1.00000 

3.35261 

1.00000 

4.90306 

9 

+ 

0.44841 

+ 

0.48956 

1.28062 

4.12311 

1.56205 

6.08276 

10 

+ 

0.30840 

+ 

0.35346 

1.88680 

4.90306 

2.60000 

7.26911 

11 

+ 

0.17992 

+ 

0.22981 

2.60000 

5.68859 

3.73631 

8.45931 

12 

+ 0.02885 

+ 

0.08647 

3.35261 

6.47765 

4.90306 

9.65194 

13 

— 

0.09590 

— 

0.02968 

4.12311 

7.26911 

6.08276 

10.84620 

14 

— 

0.25808 

— 

0.17670 

4.90306 

8.06226 

7.26911 

12.04159 

2 

+ 10.83647 

+ 11.47739 

49.45864 

55.76384 

71.41783 

80.95375 



ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 


179 


TABLE 11(b) 

Data for the Skew Logistic 


i 

aJGi 


A 6 Fa 

AsFis 

aJFji 


0 

-0.02794 

-0.01880 

3.16445 

5.69443 

4.77932 

9.04807 

1 

+0.01064 

+0.01904 

4.51499 

4.51499 

6.99562 

6.99562 

2 

+0.02495 

+0.04139 

5.69443 

3.16445 

9.04807 

4.77932 

3 

-0.01290 

+0.00929 

6.24622 

1.84451 

10.16552 

2.60213 

4 

-0.01360 

+0.01834 

5.69443 

0.81604 

9.04807 

0.87607 

S 

-0.01885 

+0.06926 

25.31452 

16.03442 

40.03660 

24.30121 

We note that k(n 

- 2fc)A< = 

5(15-10)1 = 

25; hence. 




b = -3.42677 ^ 25 = -0.1370708. 

Next, set fc = 15 -7- 2 = 7, to the nearest integer; then, by equation (24), we 
get 

SoAtPT* = - Eo P7' = 0.10330 - 0.86777 = -0.76447; 

B = 10*-*^' - 1 = io-o.im7<»x7 _i = -0.89022; 'Z.l lO*" = 3.39884. 
Hence, 

A = log [-0.76447] - log [-0.89022 X 3.39884] = 1.4025324. 

We have next 

= 1.00288; 10“ = 3.66216; lO"* = 0.25266. 

By equation (25), then, we obtain 

C = [1.00288 - 0.25266 X 3.66216] -t- 15 = 0.0051747. 

By equation (26), we get 

H = C-^ = 193.25. 


Finally, by equation (27), we obtain 

o = A + log H = i . 4025324 + 2.2861136 = 1.68865. 

The point of inflection of the curve is given by 

t, = -a\h = 1.68865 4- 0.1370708 = 12.319. 

XII. Skew Parameters for the Population of the United States. Assuming 
L = 0, we form 

Hi = 198.0 - 7.0 = 191.0; Hj = 198.0 + 7.0 = 205.0. 

»ii = 1.0 — 0.2 = 0.8; mt = 1.0 + 0.2 = 1.2. 

gi = -6.0 - 2.0 = -8.0; ?2 = -6.0 + 2.0 = -4.0. 



180 


HABRY SYLVESTER WILL 


Next, the primary data of Tables 11(a) and 11(b) are computed. Setting 
fc = 15-r5 = 3, n — fc values of the 2* sets of s, are determined and entered 
in Table III (a). The values of so, e and w for each set are computed by equa¬ 
tions (34), (35) and (36). 

In Table 111 (b), the several values of w are arranged according to their associa¬ 
tion : first, with Hi , H 2 ;second, with mi, m; and, third, with 91 , 92 . The coluixm 
sums yield the weights Sw. The values of 0 and the adjusted values of param¬ 
eters are computed by equations (37) and (38): 


TABLE III(a) 


Data for the Computation of 0 


t 

»( IU ) 

«(1 12 ) 


s ( 122 ) 

«(2 11 ) 

s (2 12 ) 

<(2 21 ) 

<(2 22 ) 

0 

-0 00883 

-0 00491 

-0 00585 

-0 00309 

-0 00594 

-0 00330 

-0 00393 

-0 00208 

1 

+0 00236 

+0 00236 

+0 00152 

+0 00152 

+0 00422 

-fO 00422 

-fO 00272 

+0 00272 

2 

+0 00438 

+0 00788 

+0 00276 

+0 00522 

-f-0 00727 

■fO 01308 

+0 00457 

-fO 00866 

3 

-0 00207 

-0 00699 

-0 00127 

-0 00496 

+0 00149 

+0 00504 

-hO 00091 

1 

4 

-0 00239 

-0 01667 

-0 OOlfiO 

-0 01552 

+0 00322 

-f0 02247 

-K) 00203 

-hO 02093 

2 

-0 00655 

-0 01832 

-0 00434 

-0 01683 

+0 01026 

+0 04151 

+0 00630 

+0 03380 

«o 

-0 00131 

-0 00366 

-0 00087 

-0 00337 

-hO 00205 

-f0 00830 

+0 00126 

+0 00676 

€ 

+0 00374 

+0 00703 

+0 00241 

+0 00550 

+0 00342 

+0 00404 

+0 00222 

+0 00643 

CO 

+0 10624 

+0 03012 

' +0 25711 

+0 04919 

+0 12708 

+0 09137 

+0 00290 

+0 03061 


TABLE III(b) 

Data for the Computation of 0 



w(*i) 

«(A,) 

«(7ni) 

w(»n,) 

w(fll) 

«(flt) 


0 1062 

0 1271 

0 1062 

0 2571 

0 1062 

0 0301 


0 0301 

0 0914 

0 0301 

0 0492 

0 2571 

0 0492 


0 2571 

0 3029 

0 1271 

0 3029 

0 1271 

0 0914 


0 0492 

0 0360 

0 0914 

0 0360 

0 3029 

0 0360 

X 

0 4426 

0 5574 

0 3548 

0 6452 

0 7933 

0 2067 


TABLE IV(a) 


Summary of Adjustments 


Parameter 


Estimated 

Valtie 


A 


0 


Adjusted 

Value 


H 

m 

Q 


+ 198 0 

+ . 1.0 
-60 


+7 0 
+0 2 
+2 0 


+0 1148 
+0 2904 
-0 5866 


+0.8036 
+0 05808 
-1.1732 


+ 198 80 
+1 05808 
-7 1732 



ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 


181 


TABLE IV(b) 
Final Transformations 


t 

G(Hmg) 

F(Hmq) 

0 

1.69772 

7.65559 

1 

1.56410 

6.60800 

2 

1.42495 

5.46440 

3 

1.29526 

4.52752 

4 

1.15991 

3.50336 

5 

1.02722 

2.50753 

6 

0.87921 

1.59408 

7 

0.72613 

1.01784 

8 

0.61866 

1.32865 

9 

0.47182 

2.17626 

10 

0.33408 

3.15374 

11 

0.20842 

4.17075 

12 

0.06189 

5.20418 

13 

1.94223 

6.24580 

14 

i.78913 

7.29229 

2 

11.20073 

62.54999 


Finally, the functions G and F are formed anew from the adjusted values of 
H, m, q. The adjusted values of s, b and a are computed by equations (40), 
(41) and (42), as follows: 

S = [EIJG, - 2'^lG, + - 2i:SF, + 

= [0.33574 ~ 2 X 3.72304 + 7.14194] -r- [26.06676 - 2 X 8.62436 

+ 27.85887] 

= 0.03161 ^ 36.67691 = 0.00086185. 
b = - A:)A<] 

= [1.42623 - 9.04837 - 0.00086185(29.57167 - 31.96048)] -5- [7(15 - 7)1] 
= [-7.62214 - 0.00086185 X (-2.38881)] -5- [56] = -0.13607. 
a = - b^lU - 

= [11.20073 - (-0.13607 X 105) - 0.00086185 X 62.54999] 15 

= 1.69561. 

In the present case, the values of Hq , mo and go were known within definite 
limits from previous experimentation. The values of the corrections, d-A, 
were, on this account, smaller than should ordinarily be expected from a first 
application of the technique. Always, it is necessary to take A sufliciently 
large to insure < 1. Asa preliminary step, it is not infrequently advantageous 
to compute trial values of t by holding constant each two of the parameters 
Ho, mo and go while experimenting roughly with the third. 



182 


HARBY SYLVESTER WILL 


TABLE V(a) 
Ordinates of Fitted Curves 


Year 

Census 

Count 

Symmetric 

Ordinates 

Percentage 

Deviations 

Skew 

Ordinates 

Percentage 

Deviations 

1790 

3.909 

3.88 

-0.78 

3.87 

-0.01 

1800 

5.280 

5.28 

-0.03 

5.27 

-0.25 

1810 

7.202 

7.16 

-0.52 

7.15 

-0.73 

1820 

9.587 

9.69 

4-1.07 

9.67 

4-0.88 

1830 

12.866 

13.04 

4-1.37 

13.02 

4-1.20 

1840 

17.069 

17.45 

4-2.22 

17.42 

-^2.09 

1850 

23.192 

23.15 

-0.20 

23.13 

-0.28 

1860 

31.443 

30.38 

-3.36 

30.37 

-3.42 

1870 

38.558 

39.36 

4-2.09 

39.31 

4-1.95 

1880 

50.156 

50.18 

4-0.05 

50.07 

-0.18 

1890 

62.948 

62.61 

-0.31 

62.60 

-0.55 

1900 

75.995 

76.79 

4-1.05 

76.64 

4-0.86 

1910 

92.329 

91.76 

-0.62 

91.72 

-0.67 

1920 

106.001 

106.96 

4-0.90 

107.16 

4-1.09 

1930 

123.068 

121.66 

-1.14 

122.23 

-0.69 


TABLE V(b) 
Extrapolations 


Year 

Forecast 

Sym. 0. 

8k. 0. 

Year 

Sym. 0. 

Sk. 0. 

1940 

137.20 

135.22 

136.26 

1780 

2.844 

2.850 

1950 

149.29 

147.18 

148.78 

1770 

2.083 

2.095 

1960 

159.88 

157.33 

159.52 

1760 

1.523 

1.539 

1970 

168.71 

165.66 

168.42 

1750 

1.113 

1.130 

1980 

175.83 

172.33 

175.59 

1740 

0.813 

0.829 

1990 

181.46 

177.52 

181.25 

1730 

0.594 

0.608 

2000 

185.82 

181.52 

185.63 

1720 

0.434 

0.445 

2010 

189.14 

184.55 

188.98 

1710 

0.316 

0.280 

2020 

193.11 

186.82 

192.97 

1700 

0.231 

0.238 

2030 

198.54 

188.52 

193.40 

1690 

0.168 

0.173 

2040 

194.94 

189.77 

194.83 

1680 

0.123 

0.127 

2050 

195.98 

190.72 

195.87 

1620 

0.090 

0.092 

2060 

196.75 

191.39 

196.64 

1610 

0.065 

0.067 

2070 

197.31 

191.88 

197.22 

1600 

0.048 

0.049 

2080 

197.73 

192.25 

197.64 

1590 

0.035 

0.036 

2090 

198.03 

192.52 

197.94 




2100 

198.25 


198.17 




2110 

198.42 


198.34 




2120 

198.54’ 


198.46 




2130 

198.63 


198.55 






ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 


183 


Part VI 

XIII. General Considerations. The technique of solution for the numeri¬ 
cal values of parameters presented in the foregoing pages is generally applicable 
to continuous functions of real variables. The abridged procedure may be 
followed whenever the given function involves a component which is linear in 
certain of the parameters: for, in such cases, it is always possible to effect a 
transformation of ordinates which will permit of the elimination of the param¬ 
eters of the linear component. In any event, the equation of the function 
may be solved for a single parameter which may then be employed, as in our 
illustration, as a means of determining the values of the test constant, omega. 

XIV. An Interpretation of Results. The equations of the symmetric and 
skew logistic curves as computed for the population of the United States are, 
written to the natural base, as follows: 

P = 193.25 :[1 + ^3 88826-^) 31662/]^ 

P = 198.80 :[1 -f- 90429-0 31331/-H) 0019846 Vl+l 068l*«-7 1732)»] 

The amount of skewness in the second of these equations, as measured by 
the value of s, is small; but, owing to the fair size of the parameter m, it de¬ 
velops rapidly and affects the form of the curve sensibly. The major effect 
is to raise the value of the limiting population as given in the first equation by 
about six millions and to prolong the period of growth by about forty years. 
The approximate limit of 193 millions in the symmetric form is reached about 
the year 2090; while the approximate limit of 199 millions of the skew form 
is not arrived at until about the year 2130. 

The positive sign of s makes for a decreasing acceleration of the rate of in¬ 
crease during the earlier phases of growth and for an increasing retardation of 
this rate during the later phases, the value of q fixing the point of transition 
in the year 1861. This general epoch has often been cited by sociologists as 
marking the shift from a dominantly rural-agricultural civilization to a domi¬ 
nantly urban-industrial one. The point at which the change takes place has, 
to my knowledge, never before been defined mathematically. 

Both curves fit the observations excellently, as shown by the percentage 
deviations of Table V(a). The forecasted growth presented in Table V(b) 
is based on the skew ordinates, the formula being 

Pt = Pt{Pu/puy^^^-^^\ (43) 

where P denotes the actual population series, observed or predicted, and p, 
the skew ordinates. The assumptions of the formula are two: first, that it 
is the observed population P\a which initiates the forecasted series; and, second, 
that the influence of the correction factor Pulpu diminishes with the time. 

The extrapolations of both the skew and symmetric formulas contrast with 
the results obtained by Doctors Dublin and Lotka, who predict a stationary 



184 


HARRY SYLVESTER. WILL 


population of 150 millions by 1970. For the same year, the ordinates of both 
the skew and symmetric curves exceed this figure, the one by 15.66, and the 
other by 18.42 millions. 

The limit of 150 millions referred to was arrived at by analysis of current 
tendencies in birth and death rates. The argument is that current birth rates 
are spuriously high and current death rates spuriously low because of the 
abnormally high proportion of men and women in the reproductive ages. This 
circumstance is due, in part, to the influx in the past of immigrants from com¬ 
munities having a high normal birth rate, and, in part^ to the high birth rates 
of preceding generations of parents in this country. 

After computation of the necessary corrections has been made, the true rate 
of natural increase of the white population for the registration area of the United 
States for the year 1920 is seen to be only about 5.4 per thousand instead of the 
10.7 per thousand iridicated by the crude rates. For the year 1930, the actual 
rate of increase is 7.5 per thousand; while the corrected or true rate turns out 
to be virtually zero. Under the interpretation of the authorities cited, the 
spurious excess of births over deaths will be entirely dissipated by the year 
1970, with the result of the stationary population predicted. 

The hazard peculiar to this method of inference arises from two assumptions 
that are made: first, that the present collection and registration of vital statis¬ 
tics is suflSciently reliable to make precise estimate of the true rate of natural 
increase possible; second, that the tendencies of fecundity and mortality ex¬ 
hibited by current data are stable. 

With respect to the first assumption, the authors have this to say: 

^‘One factor of safety of unknown magnitude remains. There is still some 
degree of laxity in the registration of births, and the figures of the true rate of 
natural increase may, on that account, be somewhat larger than recorded 
above.'’ 

The caution of the authors in this statement is in contrast with the uncritical 
acceptance of their results by those who fail to grasp the implications of 
technique. 

Concerning the second assumption, it may be pointed out that many of the 
tendencies exhibited by current data must be regarded as statistically re¬ 
versible. Falling birth rates due to drift of population to cities, to postpone¬ 
ment of marriage on the part of professional classes, to the increasing cost of 
child culture, to the ubanization of rural life and to the restriction of immigra¬ 
tion may be definitely altered by reversals in tendency. The flow of popu¬ 
lation may move into extraurban and subrural districts, where birth rates are 
more favorable to increase. The cost of child culture may, in part, be socially 
assumed. Improvement in economic conditions may lessen the drain on the 
resources of the family. The tendency for rural birth rates to fall may be 
checked. Immigration may increase with improving economic conditions. 
Death rates may be further reduced in many age classes and for many causes. 

In fine, when we attempt to project into the future the components that 





ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 


185 


(24) 



Fig 













































186 


HARRY SYLVESTER WILL 


determine the trend of natural increase, we encounter risks which vastly exceed 
those involved in the projection of the population series itself. Most of the 
data from which component trends must be determined cover but a brief period 
of time; while population data extends back for a century and a half. In this 
connection, it is not impertinent to inquire the criterion of relevance that will 
warrant a rejection of the items of the very series we arc seeking to forecast. 

It is a cardinal principle of logistic theory that the growth of population 
depends primarily on the continued supply of basic resources, physical and 
social, and that the dissipation of these resources is registered in the growth 
rate of the population itself. Any tendency of a population series toward 
skewness, that is, towai:d departure from the symmetric type of grow^th, is 
more likely to persist if it is systematic in character. The skew forms of the 
logistic function which we have developed permit us to measure any existing 
systematic tendency of the data toward skewness, and, therefore, to improve 
on the symmetric expectation of future growth. 

In the case of the United States population, the evidence of skewness, insofar 
as it bears on the problem of expectation, is adverse to the conclusion that the 
ultimate limit of growth will be less than the symmetric asymptote. Conceding 
the light that the analysis of current tendencies may throw on the probable 
occurrence of future deviations from trend, the best criterion of long-time 
growth remains the logistic projection. 

This statement, to be sure, does not relieve us of the necessity for recognizing 
the nature of the hazard that inheres in making a prediction from a trend 
extrapolation. The hazard involved in this type of inference arises from the 
assumption that the basic conditions of growth are stable, or, in other words, 
that the values of the parameters of the forecasting formula will remain sub¬ 
stantially unchanged with the inclusion of new observations. Time alone can 
provide the final test of the continued validity of this assumption. 

XV. The Law of Organic Growth. The law of organic growth in its most 
general form may be written: 

p = L + ^:[1 + (44) 

where Ui = sin[m(^ + q)]; = log[l + rri^it + g)*]; uz = \/l m\t -f- qY- 

For most practical purposes, the evaluation of thirteen parameters is out of 
the question; hence, the restricted forms a, /3, and y, equation (18), will be the 
ones most generally employed. 

I have made use of the term law of organic growth with reference to the logistic 
forms developed because I believe these functions to be the best means yet 
devised for the representation of the sequential changes which living organisms 
regularly manifest as individuals or societies. It states, in a quantitative form, 
all that is qualitatively implied by the so-called “law of diminishing returns” as 
this is commonly invoked by economists. The special sense in which I have 
used the term law may be expressed as follows: 



ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 


187 


A statistical law is a mathematical generalization on the behavior of a system of 
observations such that the implications of the formula are in accord with the assump¬ 
tions basic to the phenomenon observedy and such that evaluations of the parameters 
of the formula determined from random samples are mutually consistent. 

A statistical law, then, posits a system of relations manifesting itself in the 
form of observations which must bo subjected to analysis before the true nature 
of their interrelations can be inferred. It expresses a probable, rather than a 
certain, inference; but, within the limitations of its claim to precision, it leaves 
reason no more free to reject its specification of reality than does a law of 
mechanics. Indeed, the point is still in dispute as to whether any law of science 
can be more than a statement of probabilities. 

In contradistinction, the term empirical formula is properly restricted to cover 
the representation of the single set of observations at hand, and bears no neces¬ 
sary relation to any larger system. A sufficient test of an empirical formula is, 
therefore, the test of fit. 

We may fit an indefinite number of formulas to a population series and obtain 
satisfactory results so far as agreement is concerned; but, on extrapolating, the 
same formulas will yield results that are patently absurd. The backward 
extrapolation for the population of the United States shown in Table V(b) 
represents the known facts as closely as could be expected when we take into 
consideration that census enumerations include aboriginal and immigrant 
populations as well as native born. Certainly, no random empirical formula, 
selected on the ground of goodness of fit, could be expected to yield as satis¬ 
factory a result. 

Logistic theory does not, then, profess to guarantee infallibility of prediction. 
A population is not a mere aggregate of unrelated individuals inhabiting a 
restricted area, but a unified organization which grows by the utilization of 
total resources. When the supply of resources is profoundly disturbed or the 
basis of organizational unity destroyed, then the basis of prediction also is 
destroyed. And such reasoning is by no means peculiar to the sphere of social 
organization; for the integrity of any purely mechanical system is likewise 
conditioned by the assumption that the basis of coherence persists. 

At this point, those in whom the speculative disposition is strong may query: 
if statistical prediction does not yield a certain result, is it, in the final analysis, 
superior to the ready and far less expensive method of guessing? 

In answer, I can only say that, a posterioriy we can always, among a sufficiently 
large batch of guessers, find someone who has guessed well; but how, a prioriy 
are we to know the good guesser from the poor? A population series consists 
of definite magnitudes, and any prediction of its development must result in 
the selection, out of a vast array of possible magnitudes, that which is most 
consistent with all the kno^\m facts. The gambler may elect to hazard his 
stake on the result of a random estimate; but the prudent will give heed to the 
exacting, if laborious, procedure of mathematical analysis. 



188 


HARRY SYLVESTER -WILL 


ADDENDUM 

Another solution of the theoretical problem stated in Section I may here be 
noted. 

Given, as before; the function y = /(t, a, 6 • • • )> we may, by assigning three 
approximate values to each parameter, compute 3^ sets of values for the function 
2 /, thus: 

2/11 = /(x, ai 6 i • • • ); 2/12 = Six, aih2 • • • ); I/is = (^, dihz • • • ) i etc. 

From the observations Y, we may compute 3^ sets of the residuals y — Y; 
and from these several sets of residuals, the corresponding standard errors of 
estimate, <t, may be computed for each set of values of the function y; thus, we 
have: 

<^11 = </>(F, X, aibi) 

(Ti2 = <t>iY, X, aiW 

<^13 = </>(F, X, aih) 

Restricting the parameters to a, 6, and holding a constant, we observe that 
the values o’? i, <7? 2 , o-? 3 must vary with the assigned values of the parameter h, 
and take a minimum value when b takes its true or most probable value. As the 
errors in the approximation to b increase positively and negatively without limit, 
the computed values of will tend toward the infinite. They may, therefore, 
be assumed to lie on the arc of a parabola whose equation is a quadratic function 
of xaib; hence, we may form the following equations of representation: 

(t\i = kii + InCii + niiia\ . 

(7 1 2 = ki2 -f- “b 7YI\2p\ . 

<^18 = ^18 + ^IsCtl + 27118(1 1 . 

By addition, we have, 

( t\i + 0’?2 + <^13 = fell + ^12 + ^18+ (^11 + ^12 + W^l (^H + ^12 + . 

By appropriate variations in subscript, similar equations may be written in 
02 and as, thus: 

^2 i “t"^28 == ^ "t" “h ihl + fa “h fa)^ “t" i'^1 ”t" 27122 + • 

<^81 “f“^32 "b^88 — ^81 “t" ^82 + ^88 (^81 + fa "h fa)®3 + (27181 “b 27 I 32 + • 

These three equations are all of the quadratic form, and may be conveniently 
written as follows: 

Ail = K-i -|- Lidi -j” Midi, 

A2 = Ki -j" Lidi -f- Mid\ . 

Az == Ki “b Lidz -b Mid\ , 



ON GENERAL SOLUTION FOR PARAMETERS OF ANY FUNCTION 


189 


By precisely similar reasoning, the following equations in b may be developed: 
Bi = K2 + Uhl + MJbl . 

B 2 ^ ^2 "I" • 

Bz = K2 + Ubz + , 


where 


5, = 


O'?! 


+ 


-2 
0^2 1 


I 2 
+ 1 


B 2 - 


^2 
<T 1 2 


+ 0'22 + 


2 

0*3 2 


5s = 


-2 
^ 1 3 


+ 0’23 + 


-2 

0^3 3 - 


Since the values of ai, 02 , Ua and 61 , 62 , ?>8 are assigned, the two sets of equations 
may each be simultaneously solved to obtain values for , Li, Mi and K 2 , U , M 2 . 
To obtain the conditions for A = a minimum, B = a minimum, we differentiate 
with respect to a and 6 , as follows: 

Da(A) = Li + 2Mia; Dt{B) = U + 2 M 26 . 

Setting these two equations equal to zero and solving, we obtain the adjusted 
values of a and 6 , thus: 

CL = — Lii2Mij b = —Zi2» 23/^2* 

The extension of this method to the case of p parameters is obvious. Assign¬ 
ing three approximations to each parameter, we hold constant a value of one 
parameter (say ai), we form all possible combinations of subscripts for the 
remaining parameters (616263 with C 1 C 2 C 8 with etc.). This will yield 3^^ values 
of < 7 ^, each of which is associated with ai. Repeating this process, we can form 
similar sets of values of < 7-2 by association with 02 and 03 . We can then form the 
sums Ai = aiYxaibc • • • ); ^2 = <T{Yxa 2 bc • • • ); A 3 = <7{Yxajbc • • • )• all, 
3 X 3^~^ or 3^ distinct determinations of will be required. In like manner, 
the equations for Ri, R 2 , Rs and Ci, C 2 , C 3 , etc. are formed. The solutions for the 
adjusted values a, 6 , c, • • • follow directly. 

Since the method of solution given in Part I requires the computation of but 
2 ^ values of <t", it is evident that the method of this section is the more onerous 
when considering the determination of a single set of adjusted values of param¬ 
eters, the excess being of the order 3^:2^ = (1.5) However, being more pre¬ 
cise, the present method will require fewer approximations to arrive at satisfac¬ 
tory values of the parameters sought. In other words, the mathematical 
advantage of economy lies with the theta technique; while the advantage of 
precision lies with the quadratic technique. 


SELECTED BIBLIOGRAPHY 

Dublin, L. I. and Lotka, A. J. On the True Rate of Natural Increase. J. A. S A. S. 
1925. 


The True Rate of Natural Increase of the Population of the United States Metron, 


Je. 1930. 

Hotblling, H. Differential Equations Subject to Error and Population Estimates. J. 
A. S. A. 1927. 



190 


HARRY SYLVESTER WILL 


Hotelling, H. and F. Causes of Birth Rate Fluctuations. J. A. S. A. 1931. 

Knibbs, G. H. Laws of Growth of a Population. J. A. S. A. 1926-27. 

Lehfeldt, R. a. The Normal Law of Progress; J. R. S. S. 1916. 

Lotka, a. j. Studies on the Mode of Growth of Material Aggregates. A. J. Sci. 1907. 
Pearl, R. and Reed, L. J. On the Rate of Growth of the Population of the United States 
since 1790 and Its Mathematical Representation. Proc. N. A. Sci. 1920. 

On the Mathematical Theory of Population Growth. Metron, 1923. 

Will, H. S. On Fitting Curves to Observational Series by the Method of Differences; 
Ann. M. S. My. 1930. 

Wolfe, A. B. Is there a Biological Law of Human Population Growth? Q. J. Ec. Ag. 
1927. 

Yule, G. U. The Growth of Population and the Factors which Control It. J. R. S. S. 
Jn. 1924. 



ON A METHOD FOR EVALUATING THE MOMENTS OF A 
BERNOULLI DISTRIBUTION' 

By Everett H. Larguier, S.J. 

1 . The moments (per unit frequency) of a frequency distribution have long 
been regarded as useful characteristics of the distribution. If we denote the 
moment about the arithmetic mean by /x, we have for the Bernoulli distribution 

x-0 

where x = x — np and/(j) = 


To evaluate the s-th moment about the arithmetic mean has always been a 
laborious teisk. Karl Pearson^ gave the s-th moment about the arithmetic 
mean as, 

(1) + 

which he said at that time was perhaps the easiest expression for obtaining these 
moment coefficients by successive differentiation. Romanovsky,® however, 
was able to develop the recursion formula, 

(2) /x,+i = pq 
for the moments about the mean. Another relation for these moments is 

(3) M.+I = ic [npqii, - Ptii+il 

Recently Kirkham^ gave the expressions for the first eight moments which, 
however, are not in a form well adapted for numerical calculation on a machine. 



^ Presented to the American Mathematical Society, January 2, 1936. 

^ Karl Pearson, Biometrika, vol. 12 (1918-1919), footnote, p. 270. This expression is 
obtained from the moment-generating function. Obviously this method is exceedingly 
impractical for numerical calculations. 

* V. Romanovsky, '^Note on the moments of the binomial (p -f q)” about its mean," 
Biometrikoy vol. 15 (1923). Recently this expression was given a simple proof by A. T. 
Craig {Bull. Amer. Math. Soc., vol. 40, pp. 262-264) and extended to the Poisson case. 

* W. J. Kirkham, “Moments about the arithmetic mean of a binomial frequency distri¬ 
bution," Annals of Mathematical Statistics, vol. VI, pp. 96-101. 

191 



192 


BVEKETT H. LARGUIER 


2 . It is the purpose of this paper to express the s-th moment about the 
arithmetic mean in the form 

(4) ft. = if F,,i(n)p*, 

£-1 

where F,,t{n) are determinable functions of n dependent on 8 and t. We note 
here that p and q are the probabilities of the success and failure of an event in 
a single trial. 

Since we know that /X2 = npq and /xi = 0, it is evident that the part of (2) 
enclosed in [ ] will be of degree 2 less than s + 1 in p and hence (4) will satisfy 
as a representation of the moment. 

3. To obtain a recursion formula for the functions F*,t(n) we differentiate (4) 
with respect to p. This gives 


dp 


By (2) we may then write 


Y, F.+i,t{n)p‘ = p(l - p)ns 2 F,-i,t{n)p‘ + p(l’ - p) 2 tF,,t{n)p*-^ 

«-l t-1 «-l 

« • + ! 

= nsJ2 F,^i,,^i(n)p‘ — ns^ F,.i,t-i{n)p* 

+ 2 iP’.t{n)p^ - 2 (< - 

(-1 t—i 

Since this is an identity in p, we have immediately the following recursion 
formula for determining F,,i{n): 


(5) 


F,,t{n) = n{s - l)P’,_2,(_i(n) - n(s — l)P’,_2,t_2(n) + tF,,i,t{n) 

— it — l)F,^i,t-iin) 


in which 


(t > s; 

(6) Fo,o(n) = 1; and F,,tin) = 0 for < 1, s > 0; 

[< = 1, 8 = 1. 

These definitions arise from the known values of the moments and the condi¬ 
tions imposed by the identity in p. 

By means of (5) apd (6) we are able to obtain very readily the values for 
F,,tin) which are given in Table 1. 



EVALUATING MOMENTS OF BERNOULLI DISTRIBUTION 


193 


TABLE I 


Values of F,,t{n) 


8 

F.M 

F..,(n) 


FUn) 

1 

0 

0 

0 

0 

2 

n 

— n 

0 

0 

3 

n 

-3n 

2n 

0 

4 

n 

-7n + 3n* 

12n - 6n2 

—6n + 3n* 

5 

n 

-15n + 10n» 

50n — 40n* 

-60n + 50n* 

6 

n 

■— 31w -|“ 25w^ 

180n - 180n* + I5n^ 

-390n + 415n* - 45n» 

7 

n 

-63n + 56n^ 

602n - 686n* + 105n> 

-2100n + 2590n*-525n» 

8 

n 

-127n + 119n« 

1932n - 2394n* + 490n3 

-10206n -f 13895n* 

-3850n3 -f 105n* 


8 

F,.,(n) 

FUn) 

1 

0 

0 

2 

0 

0 

3 

0 

0 

4 

0 

0 

5 

24n - 20n* 

0 

6 

360n - 390n* + 

-120n + 130n* - 15n’ 

7 

3360n - 4270n* + 945n» 

-2520n + 3234n» - 735n* 

8 

25200n - 35700n* + lOQOOn^ - 420n* 

-31920n + 46004n* - 14770n> + 630n< 


8 

F,An) 

F,,t{n) 

1 

0 

0 

2 

0 

0 

3 

0 

0 

4 

0 

0 

5 

0 

0 

6 

0 

0 

7 

720n - 924n» -f 210n» ! 

0 

8 

20160n - 29232n» + 9520n’ - 420n* 

-5040n + 7308n* - 2380n* + 105n* 


With this table it is a relatively easy task to evaluate the first eight moments 
with the aid of a calculating machine. 

4. As an illustration of the preceding we propose to evaluate the first eight 
moments about the arithmetic mean for the binomial, (.06785 + .93215)”*. 
We first evaluate the coefficients F,,i{n). 








194 


EVERETT H. LARQUIER 


TABLE IP 


Values of F,,t(S78) 


a 

F,.i(378) 

n.>(878) 

i^,.»(378) 

F..4(378) 

^..,(378) 

1 

0 

0 


0 


0 


0 

2 

378 

-378 


0 


0 


0 

3 

378 

-1,134 


756 


0 


0 

4 

378 

426,006 


-852,768 


426,384 


0 

5 

378 

1,423,170 


-5,696,460 

7,121,520 

-2,848,608 

6 

378 

3,560,382 


784,501,200 

-2,371,307,400 

2,374,868,160 

7 

378 

7,977,690 

5,573,275,090 

-27,986,054,000 

50,430,749,000 

8 

378 

16,955,190 

26,123,640,500 

1,937,705,370,000 

-7,986,171,610,000 

8 

F«.6(378) 

F..7(378) 

F*.8(378) 


1 


0 



0 


0 


2 


0 



0 


0 


3 


0 



0 


0 


4 


0 



0 


0 


5 


0 



0 


0 


6 

- 

■791,622,720 



0 


0 


7 

-39,236,327,400 


11,210,379,300 


0 


8 

12,070,808,800,000 


-8,064,644,270,000 

2,016,161,070,000 



Then running off the powers of p, we have: 


p = .067 85 
p2 = .004 603 622 5 
p2 = .000 312 355 787 
p^ = .000 021 193 340 1 

Applying (4) we have 


= .000 001 437 968 13 
000 000 097 566 137 6 
f = .000 000 006 619 862 44 
p« = .000 000 000 449 157 667 


^ In this table, as well as in the one that follows, all values are correct to nine signifi¬ 
cant figures. 







EVALUATING MOMENTS OF BERNOULLI DISTRIBUTION 


195 


TABLE III 

Values of i^F..t{S78) 


8 

2 

3 

4 

5 

pi!’...(378) 

25.6473 

25.6473 

25.6473 

25.6473 

fF..sn) 

-1.7401693 

-5.2205079 

1961.17087 

6551.73743 

fFum 

0. 

.2361410 

-266.36702 

-1779.32225 

p*F..m) 

0. 

0. 

9.03650 

150.92880 

P^F.Am) 

0. 

0. 

0. 

-4.09621 

p'/’.,.(378) 

0. 

0, 

0. 

0. . 

P’f..7(378) 

0. 

0. 

0. 

0. 

p^Fum 

0. 

0. 

0. 

0. 


23.9071307 

20.6629331 

1729.48765 

4944.89507 

... , , . 

8 

6 

7 

8 


pF, ,(378) 

25.647 

25.65 

25.6 


p^F.Am 

16390.655 

36726.27 

780.55.3 


P’F.,3(378) 

245043.490 

1740844.73 

81.59870.3 


fF.,,m) 

-.50255.924 

-593117.96 

41066448.9 


p‘f..3(378) 

3414.985 

72.517,81 

-11483860.3 


P‘F.,6(378) 

-77.236 

-3828.14 

1177702 2 


P'F..7(378) 

0. 

74.21 

-53386.8 


p'f.,.(378) 

0. 

0. 

905.6 



214.541.617 

12.53242.57 

38945760.8 



This gives us the desired moments about the arithmetic mean of the binomial 
(.06785 + .93215)”*. These values may be rapidly checked by applying (3) 
to /18. 


Saint Louis University 
Saint Louis, Missouri. 



A METHOD OF DETERMINING THE REGRESSION CURVE WHEN 
THE MARGINAL DISTRIBUTION IS OF THE NORMAL 
LOGARITHMIC TYPE 


By Cahl-Erik Quensel 

Assistant at the Statistical Institute of the University of Lund, Sweden 


In a paper* in this Journal Professor S. D. Wicksell gave the general outlines 
of a new method of calculating the regression lines. This problem was later 
on treated in detail by Dr. Walter Andersson.** His method was to develop 
the formulas for the regression lines into a series of orthogonal polynomials 
under the assumption that the marginal distribution of the independent 
variate belonged to certain mathematically defined distributions, and to de¬ 
termine the constants with the aid of the method of the least squares. 

Among other cases he treated also the case where the marginal distribution 
was of the normal logarithmic type: 


( 1 ) 


Fix) - 


loge 


<Ti \^2vix — a) 




log (x-a)-l 


]' 


But as his method is entirely different from the method I shall give here, I 
will not go any further into the method used by Dr. Andersson. 

When the correlation surface F(x, y) of the variates x and y is given and then 
of course also the marginal distribution of x, F{x), it is known that the mean 
yx of the dependent variate y in an infinitely small array with the value of x 
between x and x -|- dx is given as a function of the independent variate x by 
the following formula (2) 

/ yFix, y) dy 

( 2 ) y, = J- -. 

/ Fix,y)dy 

In this formula the integrals are to be extended over the whole domain of 
the variation of y. 

If now we make any transformation of x by introducing a new variate «, 
related to x by the formula u — f (x), where we must suppose that m is a one¬ 
valued function of x and contrary, the distribution /(m, y) of the variate u and 
y is given by the relation 

(3) /(w, y) du dy = Fix, y) dx dy 


> S. D. Wicksell. Remarks on Regression. Annals of Mathematical Statistics, 1090. 
‘ Walter Andersson. Researches into the theory of Regression. Meddelande fr&n 
Lunds Astronomiska Observatorium. Ser II. N:r 64. 

196 



METHOD OF DETERMINING REGREBBION CURVE 


197 


Writing the formula (2) in the following form: 

/ yFi.x, y) dx dy 

2 /* =4 --; 

/ F{x, y) dx dy 

we see at once that the mean yx can be given as the following function of u: 

j yf(u, y) dy 
j /(m, y) dy 


(4) 


2/x = 


This relation, of course, is self-evident. The mean of the dependent variate 
in an array of the independent variate will be unchanged, when we change the 
variate x for another variate w, related to x by a one-valued function. 

The problem of finding the regression line of the mean yx can in such a way 
be much simplified, if it is possible to make a favorable transformation of the 
independent variate x. 

As shown by Professor WickselP we may, under certain conditions concerning 
the marginal distribution /(w), write the expression of the regression line in 
the following form: 


(5) 


2/x = 2 


Xn.l /<»'(«) 
n! /(m) 


where the X„ i coefficients are the seminvariants of the distribution of u and y. 

The conditions which the function f(u) must satisfy are among others that 
the function and all its derivates are continuous in the domain of variation and 
that the function and its derivates disappear in the limits of that domain. 
These conditions are satisfied by the normal curve of error. 

In the case where the distribution of is normal, the derivates take 

the following form: 

(6) /‘"’(m) = 

where the expressions Hn{u) are the well known Hermitian polynomials. 

The formula (5) takes the following simple fonn. 

00 

(7) 

0 

If we can change the given marginal distribution F{x) by a favorable substi¬ 
tution u = yl/{x) into a normal curvu, and if, this substitution made, we can 


* S. D. Wicksell. Analytical Theory of Regression. Meddelande fr&n Lunds Astrono- 
miska Observatorium. Ser II. N:r69. ’ 



198 


CABL-ERIK QUENBEL 


calculate the coefficients X„,i from the moments or other known characteristics 
of the given correlation distribution, F(rr, y), it is possible to express the regres¬ 
sion line as the formula (8) shows: 


( 8 ) 



Hnliix)] 


It must be observed that the polynomials Hnlyp{x)] are orthogonal with regard 
to the distribution F{x) of the independent variate x. We have 

j Hmx)]HAHx)]F{x) dx = J Hi(u)Hi(u)f(u) dw = 0 j 


Not in all cases it will perhaps be possible to calculate the Xn.i coefficients, 
when we have transformed the marginal distribution into the normal curve, 
but in one case it is rather simple to calculate these coefficients from the mo¬ 
ments given. 

The case alluded to is the one, where the variate u is given from x by the 
relation u = log(x — a), that is that the marginal distribution is of the so 
called normal logarithmic type (1). 

In that case it is possible to calculate the Xn.i coefficients from the marginal 
moments Vn.o and from the correlation moments of the type Vn,i . 

We suppose that the marginal distribution is of the logarithmic type and 
that from the moments of the x distribution we have determined the three 
constants a, tri and I in the usual manner.^ 

Then we calculate from the given correlation distribution the moments y^.o 
about the point x = a and the correlation moments about the point x = a 
and y = rrij, (the mean value of the variate). 

From these moments it is possible to calculate the X„,i coefficients in the 
following way. 

The characteristic function of u and y is given by the following relation: 


(9) 




U{tiU) = e 



y) du dy 


where the integrals are extended over the whole domain of variation. 

If the distribution of u is according to the normal law, we have Xaj.o = 0 for 
^ 3, but in the calculations here it is not at all necessary to suppose anything 
about these higher seminvariants. On the other side, the correlation distri¬ 
bution f(Uy y) is obtained from the characteristic function by the inversion 
theorem. 

(10) /(w, y) = jf * jf “ ‘ ‘ ‘ dw, dm 


* How these are to be determined is shown in Pae- Tsi Yuan. On the logarithmic 
Frequency distribution and the Semi-Lqgarithmic Correlation Surface. Annals of Mathe¬ 
matical Statistics, 1933. 



METHOD OF DETERMINING! REGRESSION CURVE 


199 


But we can also get the following relation 

( 11 ) j €^^''f(u,y)du - ^ j dw2 

Of this last expression (11) between the characteristic function and the 
distribution function I will make use in the following. 

The moments V[j of the distribution F(x, y) about the point x = a, y = niy 
are given by the formula 

(12) V[, = j j(x- ay(y - m,yF(x, y) dx dy. 

If we write y instead oi y — my and instead of x — a we write (6 = ijog e) 
the expression (12) takes the following form: 

(13) ^ 

For the marginal moments of x about the point x = a we get 


(14) 


v:.„ = 



(x — a)”F(x) dx 



€"*“/(«) du 


Comparing this formula (14) with the expression for the characteristic function 
of the distribution /(«) 



we find the following simple relation 




(16) F:.o = e *' 

For the moments of the type we get 

(17) l^n.i = j j e”’’'‘yf(u,y)dudy = j ydy j e'*'‘f{u,y) du. 

If we compare the last integral in the formula (17) Je'^fiu, y)du with the 
formula (11) we see that we can write (17) as follows: 

(18) F:., = ~J ydyj dwt 

From the sum (nbyinDiY we may take out the part ^ ^ (nh)*, 



200 


CARtr&ltIK QUEN8EL 


where I is zero and which therefore does not contain any dignity of «ij, and 
write the remainder in the following form: 


VX 


where we have 


^TT 


x; = Xiinb + ^ (nh? + (nfc)'... 

X 2 Xo 2 , 3 X 12 1 , 6 X 2 J / 

^ ^ ^ + sT + IT 

The integral — / may be considered as a frequency distribu¬ 

tion^ iy) with the seminvariants 
The formula (18) will thus be written 

(19) F ',1 = e^rr<'**■' j ydytfiiy) 

According to (16) we have 




X«i. 




and as 

we get 
( 20 ) 
or 

( 21 ) 


/ 


ydyipiy) = xj = Xnrt& + ^ (nb)^ + ~ {nhY ■ 


f:,i = f:.o-x; 


= x„n6 + ^Hn6)^ + ^ (n6)» 


We see that in the formulas for j we have all the seminvariants Xn.i in¬ 
volved. A successive determination of the seminvariants Xn.i with the aid 
of the moments of the same and lower degree is therefore not possible. 

However, when we use the formula ( 8 ) for the regression, we must suppose 
that the seminvariants Xn,i with growing n converge rather soon towards zero. 

\ f' , 

_1 rS +V»o nnrkfion + Q _ 


If the successive differences 


of the quotients are calculated, it 
V'nJ ^ 


may be possible to judge, how far it is possible to go with success. These 
differences will in most cases diminish rather soon and we shall therefore in 
most cases get a value of n about which we can suppose that the differences of 
higher order than this will all be so small that they can be neglected and as a 
consequence of this fact all higher seminvariants can be neglected too. 



METHOD OF DETERMINING REGRESSION CURVE 


201 


When this value of n has been determined, the n first seminvariants will 

y' j 

all be obtained from the n first quotients —r^. 

^n,0 

Thus we finally get the regression line as follows: 


2/x = ^ Hi[\og {x — a) — Z] 


or in standardized units: 


Vx = rriy 


n 


+ S 


1 


\.i ^ pog (^ - a) - 
i\a\ ’L 


_r 



THE STANDARD ERROR OF A ‘‘SOCIAL FORCE” 
By Stuabt C. DobD 
I. Definitions 


In the theory of measurement of social forces certain special cases of frequent 
occurrence where the population shifts from one date of measurement to the 
next require the derivation of appropriate standard error formulae. 

The theory may be briefly restated* in equations as follows: any measurable 
social change, C, in a population, P, may be defined as the difference in mean 
scores, S, from surveys or measurements on the dates denoted by subscripts 

= 5, - S, = (1) 


The momentum of a social change may be defined as the product of its time 
rate in years and the population that is being changed 

Mt-i = PF,_i (2) 


PCt-i 

Y^i 


= f?— (St — Si) 

^ *-l 


(2a) 


where Fj_i is the period from date 1 to date 2 and V is the velocity, or speed 
of change, in that period. The acceleration of a social change is definable as 
the rate of change of the velocity of change 


A = - ^«-* 

.6F(4-8-»+1) 


(3) 


where each velocity, being an average for its period, is taken as representing 
the mid-date of that period. 

The resultant social force which produces a measured change is now definable 
as that which accelerates the change in a population. It is measurable as the 
product of the acceleration and the population.* 


F = AP 

P (Si S, St Si \ 

.5y(4_,_2+i, VF«-i Yt-i Yi-t Yi^j 


(4) 

(5) 


' A Controlled Expenmeni on Rural Hygiene in Syria^ Dodd, S. C., Publicatione of the 
American University of Beirut, Syria, Social Science Series No. 7,1934, pp. 336. 

Also, A Theory for the Measurement of Some Social Forces ^ Dodd, S. C., Scientific 
Monthly, Vol. XLIII,' No. 1, July 1936, pp. 58-62 

* Force thus defined in terms of its effect is a resultant force, i.e., the residual force after 
deducting all resisting forces from the total force in the direction of the change observed. 
This formula defines quantitatively and exactly the *‘net’^ force not the “gross" force 

202 




203 


STANDARD ERROR OF “SOCIAL FORCE^’ 


n. The Sampling error of one case (momentum) 

The formulae for the standard errors of sampling for the above concepts, 
social change, velocity, momentum, acceleration and force, (C, F, M, -4, and F) 
have been published for the case where the population, P, is the same on all 
dates of measurement. But it is not always possible to observe the ideal ex¬ 
perimental technic of holding the population unchanged in number nor to select 
out individuals common to all the surveys and to neglect the rest. Ordinarily 
there will be different P's, Pi, P2, Pa, and P4, at the different dates. 

To derive the standard errors of (2) and (4) when P shifts, each P is con¬ 
sidered to be a sub-sample* of the main sample which is (Pi + Ps + Pa + P 4 ). 
The orthodox view of sampling is taken where the sub-samples may differ in 
size but maintain fixed proportions in each main sample which is drawn from 
the “parent" population. 

Let primes denote an Af, or other function of (1) to (5), which is an approxi¬ 
mation due to the shifting of the population and the use of an average P. 

To simplify and generalize the notation, let k denote the constant term com¬ 
pounded of P's and F's which is associated with each S. The first subscript 
of k denotes the function,/, which is any particular one of the left hand members 
of equations (1) to (5) and the second subscript denotes the date of its S, Thus, 
from (2a) 


t. _ ~ _ u 

2Yt.i 

(6) 

Then (2) may be rewritten: 


A/2-1 = ^1 kui + St kin 

(7) 

II 

(7a) 

To derive the standard error of (7) the total differential is: 


dM= fci,i d d 

(8) 


If Oil denotes the population common to both dates of measurement so that: 


Pi = Oi* + Qi 
P* = Qi2 4 " Q* 


(9) 


producing the change. It thus measures only the observable part of the total forces in the 
situation. The fundamental problem remains, as always in science, to observe more 
adequately, to devise experimental and statistical technics for measuring the different 
forces (in isolation and in combinations) which facilitate or resist the measured change. 

* The author is indebted to Mr. S. S. Wilks (Princeton) for this method of deriving these 
standard errors in a fluctuating population. 



204 


STUART C. DODD 


and, since the differential of a sum is the sum of the differentials of the several 
terms, (8) becomes 

I, / \ ^ / \ 

dMt-1 = ■)" ^ "b ^ y 

Squaring gives 

{dMt-,? = § (E da^y + ^ (E 

t * 2 

( 11 ) 

9 I, I, r <?u <?ii Qii <?2 0i On Oi Oi “1 

H—^5^* E E + E ^*1E + E <^1E + E E 

L I 1 1 1 1 1 1 iJ 


On summing and dividing by the number of cases to get the expected values, 
the last three terms in the square brackets vanish. Using the relation where, 
in random sampling, the correlation between two variables is the same as the 
correlation between their means 


gives 


ri2 = ra,5, 


i:SiS2 
Qi2 (Ti<T2 


^ Si S2 \ 

Qn/ 

y/Qn * Qi2 




kh <t\ , fc^2 fcji/2 Qi2 O'! 0-2 ri2 

“r ^5 I 


Pi 


P 1 P 2 


( 12 ) 


(13) 


Standard error of momentum when the population shifts 


The best estimates of 0*1 and a 2 are the standard deviations of the scores, Si 
and S 2 , and the best estimate of ri 2 is, strictly, the covariance of the common cases 
divided by the two sigmas. Unless the selection of Q 12 out of Pi and P 2 cur¬ 
tails the range in some way (i.e., Q 12 is not a random selection), then, except for 
sampling variation, ai and <72 are the same in the Q 12 population as in the Pi and 
P 2 populations so that there is only a sampling discrepancy between the ratio 
above and the ri 2 , the observed correlation between the Si and S 2 scores in 
the Qi 2 population. 


in. The generalized standard error 

The above standard error may be readily generalized. Any of the equations 
(1) to (5) may be expressed as a simple linear sum of the products of a variable, 
S, and its appropriate constant, k. 

f = E -s. fc/< 

i -1 


( 14 ) 



205 


STANDARD ERROR OF ‘^SOCIAL FORCE^^ 


where / is any one of the concepts S, C, Ay M or F defined by (1) t6 (5) 
and n is the number of surveys, or different >S^s involved, and i denotes each 
survey in turn from 1 to n. Thus where/means F, (5) becomes: 


where 


fr = F' = kpi Si + kp2 S 2 + kpz Sz + kp4 Si 

= E knSi 

i -1 


, _ P, + f 2 + Pa + P 

2 F(4_3-2+1) T(2_ 

___ + P 2 + Fs + F 

2 F(4-3-2+1) F(4- 


(15) 


(16a) 

(16b) 


In the special case when a force, F, has been determined from only three 
surveys using two consecutive periods, n = 3 and 


Fi + F 2 + Fa 

1.5 7(3-1) (72-1) 


(16c) 


___ (Fi + F 2 + Fa) (7(2-1) + 7(8^2)) 
1,5 7(3-1) 7(3-2) 7(2-1) 

Fi 4~ F 2 -f- F 3 

1.5 7(3-1) 7(3-2) 


(16d) 

(16e) 


If the difference between two forces (or other functions, /) has been measured 
in either the same or in different populations and the significance of the differ¬ 
ence in terms of its standard error is desired, / of (14) can also denote that 
difference. 


fdF = Fa — Fft; fdM = Ma — Mb) etc. 


(17) 


Jt is only necessary to write the difference as a linear sum of products of S 
and k on the model of (2a) or (5) to get the A:-values for that particular /. 

It is now possible to write the standard error formula for / in a single gen¬ 
eralized form that covers all the concepts and their differences as defined in 
equations (1) to (5), (14) and (17). Observing that (14) is the general case for n 
surveys of the particular case (7a) where n = 2, it becomes evident, that on 
taking differentials, squaring, summing, and dividing the linear sum of the n 
terms of (14) there results terms of which there are n that are variances 

_ fi 

are different terms each occurring 


]^2(y2 ^2 

(times constants) of the sort -p- and —^ 


twice that are covariances (times constants) of the sort — 


From these 



206 


STUART C. DODD 


rough considerations as well as from rigorous derivation, the generalized standard 
error of (14) is found to be: 


-Zj ’ 

The generalized standard error. 


(18) 


Where i and j denote each of the n surveys in turn. There will thus be n* 
terms to be summed—the number of combinations of i with j including the 
cases where i = j. 

The derivation of (18) as well as its computation from data and its inter¬ 
pretation in special cases can all be made clearer by arranging the terms in a 
square array as follows: 



► 

1 

2 


n 

3 

i 

Coefficients 

i - 

k/i ai 

Pi 

k/t(T2 

Pi 


kfn On 

Pn 

1 

Pi 

Pi 

( ) 

Qijru 
( ) 


Qln^ln 
( ) 

2 

kfi (Ti 

Pt 

Qara 
( ) 

Pi 

( ) 


QtnTtn 

( ) 







n 

kfnOn 

P 

n 

QlnTln 
( ) 

Qsn “^in 
( ) 


Pn 

( ) 


To get Of write the computed values of the coefficients as captions of rows 

and of columns and write each computed Qr value in its appropriate cell, noting 
that in the main diagonal cells the self-correlations are unities and the popu¬ 
lation common to both column and row surveys, Qa is the entire population 
of that survey as = Pi when i = j. Thus Qn = Pi . Next in each cell's 
parenthesis enter the product of three factors, namely: a) the cell Qr term, 
b) the column coefficient, and c) the row coefficient. The sum of these products 
in the parentheses, in number, is aj of (18). 

From the above square array it becomes clear that whenever in (17) the 
difference of two observed forces, or other functions, is derived from different 
populations the Q between these populations is zero so that the entire product 
terms in those cells vanish. Thus in the very simplest and familiar case of 

















207 


STANDARD ERROR OP ‘‘SOCIAL FORCE"' 


comparing two means from different populations, n = 2, Qu = 0, A; = 1, and 
(18) reduces to the usual sum of the two variances of the two means 

(T* (T* 

difference in means = (19) 

1 mTi 


IV. Some special cases 

It should be observed that the above formulae for the standard errors when 
P shifts all become identical with the simpler formulae previously derived for 
the case of a cohstant P. In this case, every Qpq Pp = Pq and in the square 
array (in addition to fc's which no longer involve an average P), the Q or P of 
the ceDs and the P's in the row coeflScients, may be omitted as they cancel each 
other out. 

Another special but very frequent case is where the social change is not 
given in terms of a difference in means, Si and S 2 , but in terms of a difference 
in percentages, as when a literacy rate rises from 30% to 40%. A percentage 
can be viewed as a mean of a two-category, all-or-none, present-or-absent 
variable such as: A, non-A (foreign or native bom, literate or illiterate, etc), 
where A is assigned a value of 1 and non-A a value of 0. Then the sum of the 
values of A, each times its frequency, divided by the population is both a pro¬ 
portion and a mean. Its standard error in the percentage, p, form of expres¬ 
sion is then equal to it in the mean form: 


Cp 


V Vl 00 — p 

--- 

VP 


<r, 

Vp 


^where 8 


1 or 0 and 


!>-! = «) 


( 20 ) 


so that where Si in (14) is a percent p(1.00 — p) should be substituted for <r< 
(and (Ty) in (18), In this case the appropriate formula to use for getting r,-,* 
in (18) depends on the nature of the distribution of the variable that is expressed 
in percentage form. If the distribution is normal, tetrachoric r may be ap¬ 
propriate, while if the S in percentage form is from a two point distribution, 
r from a four fold point surface may be appropriate. 

In all the above cases the usual interpretation of the significance of /in respect 
to sampling errors may be used in entering a normal probability table with a 
given (Tf from (18) and reading the probability of such a/ occurring by chance.* 

For a numerical illustration of this formula (18), consider the case of two 
villages, the statistical significance of whose momentums of a social change are 
to be determined. The data are from a study^ of Syrian villages where an 


* I4f. Wilks comments here that, *‘there is a more exact and rigorous test for comparing 
the two sets of S*b which enter into a pair of M's or F's which involves some recent statisti¬ 
cal theory but it is doubtful if the extra refinement is worth while at this stage of soci¬ 
ometric development.'' 



208 


STUART C. DODD 


itinerant Health Clinic in two years changed the average hygienic status of the 
families in each village by amounts of score (on a scale of 1 to 1000 points, 
devised for this study) as indicated in the table below. 


Mean score in 1931 = Si = 

Village A 
253 

Village B 
321 

CO 

00 

00 

II 

II 

304 

528 

Population (families) in 1931 = Pi = 

46 

46 

1933 = P 2 = 

40 

32 

Standard deviation of scores in 1931 == = 

54 ’ 

39 

1933 = - 

58 

70 

Families common to both censuses = Q 12 = 

40 

32 

Correlation of scores from the 2 dates = ri 2 = 

.00 

.19 

fcjn == —(Pi + ^ 2 ) 721 ^( 2 - 1 ) = 

-21.5 

-19.5 

fcj/2 = — fc^l = 

21.5 

19.5 

kiii<^i/Pi = 

-25.24 

-16.53 

= 

31.17 

42.65 

Qi2^12 = 

0 

6.08 


261 

249* 

^ 

Momentum = Af 2 _i 

1,097 

4,037 

Significance ratio 

4.2 

16.2 


* The calculation of this <t by (18) may be illustrated in detail: 


Village B 


Coefficients, ^ > 

i 

1 

2 

-16.53 

42.65 

1 

-16.53 

46 (= Fi) 
(12,571) 

6.08 (= Qr) 
(-4,286) 

2 

42.65 

6.08 (= Qr) 
(-4.286) 

32 (= P,) 
(58,208) 


S( ) = 62,207 

_ —2 r 

^"'(*-1) “ 249 


The momentum of the movement towards improved hygiene achieved in 
village A is 4.2 times its standard error, while that of village B is 16.2 times 
its standard error. The excess momentum of village A over village B is 


8.1 



times the standard error of their difference in momenta. 


Since 


all three of these significance ratios are well over 3 the conclusion is that the 
observed momenta and difference of momenta are statistically significant and 
cannot reasonably be due to sampling fluctuations. It may be noted that the 
significance ratios for the amounts of this social change, the difference in mean 
scores, are in close agreement with the above figures, being 4.1 and 15.9 for 



STANDARD ERROR OF ‘^SOCIAL FORCE^' 209 

villages A and B respectively, instead of 4.2 and 16.2 as above. These discrep¬ 
ancies of a .1 and .3 in the statistical significance of these social changes com¬ 
pared with the corresponding social momenta are accounted for by the fact 
that the shift in the size of the population is allowed for in our formula for 
the case of momenta and is not considered in the usual formula for the case of 
social change. 

A minimum of three measurements of one population is necessary to deter¬ 
mine a social force. To determine its standard error all the correlations must 
be secured between every pair of measurements, each correlation derived from 
the part of the total population that is common to that pair of measurements. 
Obviously the data as currently reported from surveys and censuses and statisti¬ 
cal bureaus do not meet these specifications. More rigorous analysis of social 
data and reporting of correlations in it is a prerequisite to the measurement of 
social forces and their significance. 

American University of Beirut, Syria. 



AN APPROXIMATION TO ‘‘STUDENT’S” DISTRIBUTION* 


By Walter A. Hendricks 
I. Introduction 

The function commonly known as ‘^Student^s^’ distribution occupies a promi¬ 
nent position among the classic contributions to the field of statistics, not only 
for its intrinsic value but also for the stimulus which it gave to statistical re¬ 
search at the time of its discovery. 

The function, which may be written in the form, 

gives the distribution of the ratio, z, of the estimated arithmetic mean, to the 
estimated standard deviation, s, for samples of n observations drawn from the 
normal universe specified by the arithmetic mean, zero, and the standard 
deviation, This function, together with a table of values of its integral was 
given by Student.”*' 

In view of the fact that similar distributions were subsequently found by 
Fisher* to arise in a larger variety of practical problems than was originally 
supposed, a table of values of a new integral was later given by ^^Student”^^ 
in which the distribution of a variable, t, defined by the relation, 

(2) e = (n - l)Jz, 

rather than the distribution of z itself, was considered. Another table giving 
the distribution of in a form intended to be more convenient for use by re¬ 
search workers wishing to apply statistical methods to experimental data, was 
later given by Fisher.* 

The integration of functions of the type defined by equation (1) involves 
considerable labor, a fact which has been somewhat embarrassing to practical 
statisticians interested in the distributions of z and t for values of n larger than 
those included in the above-mentioned tables. The recent appearance of 
Tables of the Incomplete Beta-Function, prepared under the direction of 
Pearson,’ has considerably alleviated the difiiculty, but the requirements of 
certain practical problems are not easily satisfied even with the aid of these 
tables. Consequently, simple approximations to the distributions of z and 

♦A thesis submitted to the Faculty of the Columbian College of The George Wash¬ 
ington University in part satisfaction of the requirements for the degree of Master of 
Arts. 


210 



211 


AN APPROXIMATION TO “STUDENT^S^^ DISTRIBUTION 


which will be sufficiently accurate for most practical purposes, should be of 
some interest. 

According to Student,’^® the distribution of z tends to approach a normal 
curve with a standard deviation of (n — 3)~* for values of n greater than 10. 
However, Deming and Birge^ have recently suggested that the distribution 
tends to approach a normal curve with a standard deviation of (w — IJ)*”*. 

This thesis presents a simple approximation to the distribution of z, which 
can be readily extended to the distribution of t and which will give more accu¬ 
rate results than either of the above approximations. 


n. Approximation to the Distribution of Z 

The approximation presented here is based upon the assumption that, for 
large values of n, the distribution of 8 tends to approach a normal curve with the 

arithmetic mean, s, and the standard deviation, that is, 


(3) 


dF, 


n* -i-(,-;)! 
= ^ ( 


Since the distribution of the estimated arithmetic mean, x, is known to be 
normal, with the standard deviation, we have for the joint distribution of 
8 and x: 


(4) 




s may be expressed in terms of n and a by the well-known relation, 

(5) S = Cn<T, 

in which the factor, c„, is defined by the formula, 

( 6 ) 


^ _ 2* r(in) 

Cn — 


n* r[i(n - 1)]' 

If we write, CnO-, in place of §, in equation (4) and make the transformation, 
(7) X = sZy 

we have for the joint distribution of s and z: 


( 8 ) 




8 dsdz. 


To find the distribution of z, all that is necessary is to write: 


(9) 


dF, = k p” s dz, 



in which: 


WALTER A. HENDRICKS 


, n - fUJn* 


' ' »- i. 

b = 2*n*Cn (z^ + 2)”*. 

The integral in brackets in equation (9) can be evaluated without any diffi¬ 
culty. We have: 

( 11 ) f 


Substituting this value in equation (9) and Replacing A:, a, and b by the quan¬ 
tities which they represent, we obtain the following expression for the distribu¬ 
tion of z : 


__ 2n^Cn - nc„» 

dr t — —I— c 

TT* 


(2^ + 2)-« dz. 


If we now define a new variable, u, by the relation, 

(13) = 2nc„* 

and make the appropriate substitutions in equation (12), we have, for the dis¬ 
tribution function of u: 

(14) dFu = du. 

Equation (14) is obviously a normal curve with unit standard deviation. We 
have thus deduced the interesting fact that, for values of n sufficiently large so 
that the distribution of 5 may be represented by a normal curve, the quantity, 

2%*Cn is distributed as a normal deviate with unit standard deviation. 

(2* } 2) 

The accuracy of this approximation as compared with that of the approxima¬ 
tion suggested by “Student^’® and that of the more recent approximation sug¬ 
gested by Deming and Birge^ may now be considered. As previously stated, 
the “Student^^ approximation is based on the assumption that the quantity, 
{n — S)kj is distributed as a normal deviate with unit standard deviation for 
values of n greater than 10, while that suggested by Deming and Birge is based 
on the assumption that the quantity, (n — li)*2, is so distributed. 

Table 1* gives values of the integral, /,, defined by: 


B[h{n 


i_ [‘ 

- 1), i- 


(1 + 


All tables and charts to which reference is made are to be found in the Appendix. 



4N APPftOXnCATION TO ‘^STUDBNT’B'' BISTRIBOTION 


213 


for the case, n lo, together with the corresponding approximate values bb- 
tamed by making use of the three approximations suggested by '^Student/' 
Deming and Birge, and the present author, respectively. The exact values 
imd thqse obtain^ by the Student’^ approximation were derived from values 
calculated by ‘‘Student^^* and given by Pearson.® All other data in the table 
were calculated by the present author. 

An inspection of Table 1 shows that the values of /, based on the approxima¬ 
tion presented in this thesis agree very well with the corresponding exact values. 
The agreement is better than that found in the case of either of the other two 
approximations. The Deming and Birge approximation gives better results 
than the “Student” approximation for values of z in the neighborhood of zero, 
but for other values of z the opposite is true. 


m. Approximation to the Distribution of t 

Since tables giving the distribution of the variable, have largely superseded 
those giving the distribution of z in practical statistical work, the feasibility of 
appl 3 ring the above three approximations to the distribution of < is worthy of 
consideration. 

The variable, has already been defined in terms of n and z by equation (2). 
If, in equation (12), we make the transformation, 

(16) z = (n - 1 )*^, 
we have, for the distribution function of t: 

(17) dFt = e -“»*[p 4 . 2(n - !)]-» dt. 

TT* 


If we now define a variable, v, by the relation, 


(18) 


= 2ncn* 


<* + 2 (n - 1 )’ 


we have, for the distribution function of v: 


(19) df, = 

Equation (19) shows that, for values of n sufiBciently large so that the distribu¬ 
tion of « may be represented by a normal curve, the quantity, • 


2lnlc»; 


t 


[<» + 2 (n - !)]»’ 

is distributed as a normal deviate with unit standard deviation. On the other 
hand, if we assume with “Student” that, for large values of n, the quantity, 
(n — 3 )*s, is normally distributed about zero with unit standard deviation, we 


e^ould expect to find that the quantity. 


(n - 3)t 
(n - 1 )* 


t, is also distributed as a normal 



214 


WALTER A. HENDRICKS 


deviate with unit standard deviation. If the Deming and Birge approximation 
to the distribution of z is assumed to be valid, we should expect to find that the 

quantity, ^^ distributed as a normal deviate with unit standard 
(n - 1)* 

deviation. 

To test the accuracy of each of these three approximations to the distribution 
of we may make use of the well-known table of values of t given by Fisher.* 
This table is so constructed that a value of t corresponding'to a given number of 
‘‘degrees of freedom” and a given value of ''P” may be read from the table, 
where P is defined by the relation, 


( 20 ) 


P = 1 - 


(n- l)*B[i(n 


/ (‘ + S^i) *■ 


The entries in the last line of the table, corresponding to an infinite number of 
'^degrees of freedom,” are the deviates of a normal curve with unit standard 
deviation. 

To test the accuracy of the ‘^Student” approximation, we may calculate the 
entries for a line of this table, corresponding to n — 1 ‘‘degrees of freedom,” by 

multiplying the entries in the last line of the table by ^These approxi- 

(n - 3)* 

mate values of t may then be compared with the exact values given in the table. 
The accuracy of the Deming and Birge approximation may be tested in the same 
manner, except that in this case the entries in the last line of the table should be 
(n - 1)^ 


multiplied by 


To test the accuracy of the approximation given by 


(n-li)** 

equation (19), we may calculate the values of t corresponding to n. — 1 “degrees 
of freedom” by means of the relation, 


( 21 ) 


<2 = 


2(n^- l)t;2 
2nCy? — 


in which the entries in the last line of the table are to be taken as the values of v. 

Table 2 gives the exact values of t corresponding to the values of P given in 
Fisher^s table for n = 10, together with the approximate values calculated by 
means of each of the above three approximations. This comparison of the 
accuracies of the three approximations is equivalent to the comparisons pre¬ 
sented in Table 1. The conclusions which may be drawn are in agreement with 
those which have already been drawn from that table. 

In order to test the behavior of each of the approximations for a larger value 
of n, values of t corresponding to the different values of P were calculated for 
n = 30. The results are presented in Table 3. The rank of each of the three 
approximations, with regard to accuracy, for n = 30 is the same as forn = 10. 
Although all three give more accurate results for the larger value of n, the 
superiority of the approximation presented in this thesis is quite apparent. 



AN APPROXIMATION TO ‘‘sTUDENT’s'^ DISTRIBUTION 


216 


For extremely large values of n, all three approximations evidently tend to 
become one-hundred percent accurate, for the distribution of t tends to become 
normal as n is increased indefinitely. In the case of the ^^Student^^ and Deming 

and 


and Birge approximations, the ratios, 


obviously ap- 


(n ~ 3)* (n - li)*' 
proach unity, respectively, as n becomes very large. The approximate value of 
t given by equation (21) also tends to approach the normal deviate, r, as n is 
increased for we have: 


,i„ p . to, - 1)'^ . Um r Y . - „ f .1 

«-.» n-.« 2nc„* — L2nc„* — d* 2nc„* — 


( 22 ) 


= lim 

n-*ao 




2^ 


Cn^ - 


2ncn* — 
2n 


= 


IV. Discussion 


The greater accuracy of the approximation to the distribution of z presented 
in this thesis apparently can not be explained by the hypothesis that the distribu¬ 
tion of 8 becomes normal more rapidly than the distribution of z as n is increased. 
Table 4 presents values of the ordinates of the normal curve with unit standard 
deviation, together with the corresponding ordinates of the exact distributions 

of the quantities, {s — s), (n — 3)k, (n — and 2*n*Cn . y 

2^71^ 

n = 10. Although the distribution of- (s — 5) seems to follow the normal 

O' 

curve more closely than does the distribution of (n — 3)*z, the opposite seems 
to be true in the case of the distribution of (n — 1^)*^. The distribution of 
z 


2*n*Cn 


, however, follows the normal curve quite closely. 


(^ + 2)* = 

The behavior of these distributions for n = 10 can be observed more easily 

2^71^ 

in Figures 1, 2, and 3 in which the frequency curves of-(s — S), (n — 3)*«, 

<T 

and (n — l^)^z are respectively plotted together with the normal curve with 

z 

unit standard deviation. The frequency curve of 2^n*c„ ^ was not 

{z^ + 2 )* 

plotted because of the fact that this curve follows the normal curve so closely 
that the two curves could not be distinguished when plotted on the scale used in 
the other three charts. 

The most reasonable conclusion which can be drawn from Table 4 and Figures 
1, 2, and 3 is that the departure of the exact distribution of s from the normal 
curve has very little effect in destroying the normality of the distribution 
z 


of 2*n*Cn 


(** + 2 )‘' 



216 


WALTEB A. HENDRICKS 


V. Values of the Factor, Cn 

For the practical application of the approximations to the distributions of 
z and t presented in this thesis, a table of values of the factor, Cn, is required. 
Values of this factor, for values of n as high as 100, have been tabulated by 
Pearson^’ ® and by Shewhart.® For values of n greater than 100, Cn may be 
calculated accurately to at least five significant figures by the following relation, 
given by Pearson® and by Deming and Birge^; 


(23) 


£ _ 

4n 32n* * 


Table 5 presents values of Cn for some large values of n, calculated by the 
present author. For values of n not included in this table, Cn may be calculated 
by means of equation (23) just as rapidly as by interpolation in the table. 


VI. Summary and Conclusions 

For values of n sufficiently large so that the distribution of s may be repre¬ 
sented by a normal curve, the quantities. 


2*n*Cn 


(Z* + 2)* 


and 2*n*Cn 


+ 2(n - 1)]*^ 


are distributed as normal deviates with unit standard deviation. The results 
obtained by assuming a normal distribution of 8 are more accurate than those 
obtained by assuming that either (n — 3)^2 or (n — l^)^z is distributed as a 
normal deviate with unit standard deviation. For extremely large values of n, 
the distribution of each of the above quantities tends to approach a normal 
curve with a mean of zero and unit standard deviation. 


Vn. References 

(1) Dbhinq, W. Edwards, and R. T. Birgs, 1934. On the statistical theory of errors. 

Reviews of Modern Physics, 6:11^161. 

(2) Fisher, R. A., 1925. Applications of “Student’s” distribution. Metron, 6: 90-104. 

(3) Fisher, R. A., 1934. Statistical Methods for Research Workers, 5th ed. Oliver and 

Boyd, Edinburgh and London. 

(4) Pearson, Karl, 1915. On the distribution of the standard deviations of small 

samples. Biometrika, 10:522-529. 

(5) Pearson, Kari<, 1924. Tables For Statisticians And Biometricians, Part I, 2nd ed. 

Cambridge University Press, Cambridge. 

(6) Pearson, Karl, 1931. Tables For Statisticians And Biometricians, Part II, 2nd ed. 

Cambridge University Press, Cambridge. 

(7) Pearson, Karl, 1934. Tables Of The Incomplete Beta-Function. Cambridge 

University Pre^B, Cambridge, 

(8) Shewhart, W. a., 1931. Economic Control Of Quality Of Manufactured Product. 

D. Van Nostrand Co., New York. 

(9) “Student,” 1908. The probable error of a mean. Biometrika, 6:1-25. 

(10) “Student,” 1917. Tables for estimating the probability that the mean of a unique 
sample of observations lies between — oo and any given distance of the mean of 
the population from which the sample is drawn. Biometrika, 11:414-417. 



217 


AN APPROXIMATION TO “STUDBNT'b” DISTRIBUTION 

(11) Student,” 1925. New tables for testing the significance of observations. Metron, 
5:105>120. 

Ths Obobge Washington Univebsitt. 
e 


Vin. Appendix 

TABLE 1 


Exact values of 7, and approximate values^ derived from tables of the normal 
probability integral^ for n == 10 


z 

I. 

Exact value 

“Student” 

approximation 

Deming & Birge 
approximation 

Hendricks 

approximation 

-2.0 

.0001 

.0000 

.0000 

.0004 

-1.8 

.0002 

.0000 

.0000 

.0006 

-1.6 

.0005 

.0000 

.0000 

.0010 

-1.4 

.0011 

.0001 

.0000 

.0018 

-1.2 





-1.0 





- .8 

.0199 


.0098 


- .6 



.0401 


- .4 





- .2 

.2816 

.2984 

.2799 

.2817 

.0 

.5000 

.5000 

.5000 


+ .2 

.7184 

.7016 

.7201 

.7183 

+ .4 

.8696 

.8552 

.8782 

.8693 

+ .6 

.9473 

.9438 

.9599 

.9465 

+ .8 


.9829 


.9789 

+ 1.0 

.9925 

.9959 

.9982 

.9914 

+1.2 

.9971 

.9993 

.9998 

.9962 

+1.4 

.9989 

.9999 

1.0000 

.9982 

+1.6 

.9995 

1.0000 

1.0000 

.9990 

+ 1.8 

.9998 

1.0000 

1.0000 

.9994 

+2.0 

.9999 

1.0000 

1.0000 

.9996 







218 


WALTER A. HENDRICKS 


TABLE 2 

Exact values of t corresponding to different values of P and approximate values^ 
derived from normal deviates^ for n = 10 ^ 



t 

p 

Exact value 

“Student” 

approximation 

Deming Birge 
approximation 

Hendricks 

approximation 

.90 

.129 

.142 

.129 

.129 

.80 

.261 

.287 

.261 

.261 

.70 

.398 

.437 

.396 

.398 

.60 

.543 

.595 

.540 

.544 

.60 

.703 

.765 

.694 

.703 

.40 

.883 

.954 

.866 

.884 

.30 

1.100 

1.175 

1.066 

1.104 

.20 

1.383 

1.453 

1.319 

1.386 

.10 

1.833 

1.865 

1.693 

1.844 

.05 

2.262 

2.222 

2.017 

2.290 

.02 

2.821 

2.638 

2.394 

2.896 

.01 

3.250 

2.921 

2.650 

3.389 


TABLE 3 

Exact values of t corresponding to different values of P and approximate values, 
derived from normal deviates^ for n = 30 


P 

Exact value 

‘‘Student” 

approximation 

Deming & Birge 
approximation 

Hendricks 

approximation 

.90 

.127 

.130 

.127 

.127 

.80 

.256 

.263 

.256 

.256 

.70 

.389 

.399 

.389 

.389 

.60 

.530 

.543 

.529 

.530 

.50 

.683 

.699 

.680 

.683 

.40 

.854 

.872 

.849 

.854 

.30 

1.055 

1.074 

1.045 

1.055 

.20 

1.311 

1.328 

1.293 

1.312 

.10 

1.699 

1.705 

1.659 

1.700 

.05 

2.045 

2.031 

1.977 

2.047 

.02 

2.462 

2.411 

2.347 

2.466 

.01 

.2.756 

2.670 

2.598 

2.764 




219 


AN APPROXIMATION TO “STUDENT’s” DISTRIBUTION 


TABLE 4 

Ordinates of the normal curve with unit standard deviation and ordinates of the 

2*n* 

exact distribution functions of - (« — I), (n — 3)^z, (n — l^)h, and 

<T 


2^n^Cn 


(z^ + 2 )* 


for n = 10 


Deviation from 
mean 

Ordinates of distribution function 

Normal 

deviate 

-(•» - *) 

tr 

(n — 3)iz 

(n - li)iz 


- ^ (z* -h 2)* 

-3.0 

.0044 

.0006 

.0071 

.0108 

.0034 

-2.5 

.0175 

.0085 

.0181 

.0254 

.0156 

-2.0 

.0540 

.0454 

.0459 

.0581 

.0544 

-1.5 

.1295 

.1356 

.1092 

.1234 

.1306 

-1.0 

.2420 

.2663 

.2256 

.2290 

.2426 

- .5 

.3521 

.3751 

.3692 

.3454 

.3522 

.0 

.3989 

.3999 

.4400 

.3991 

.3990 

+ .5 

.3521 

.3343 

.3692 

.3454 

.3522 

+ 1.0 

.2420 

.2245 

.2256 

.2290 

.2426 

+ 1.5 

.1295 

.1233 

.1092 

.1234 

.1306 

+2.0 

.0540 

.0560 

.0459 

.0581 

.0544 

+2.5 

.0175 

.0213 

.0181 

.0254 

.0156 

+3.0 

.0044 

.0068 

.0071 

.0108 

.0034 


TABLE 5 


Values of Cn for large values of n 


n 

Cn 

n 

Cn 

100 

.99248 

900 

.99917 

150 

.99499 

1000 

.99925 

200 

.99624 

2000 

.99962 

250 

.99700 

3000 

.99975 

300 

.99750 

4000 

.99981 

350 

.99786 

5000 

.99985 

400 

.99812 

10000 

.99992 

450 

.99833 

20000 

.99996 

500 

.99850 

30000 

.99997 

600 

.99875 

40000 

.99998 

700 

.99893 

50000 

.99998 

800 

.99906 

100000 

.99999 




Unit Standard Deviation 


-, exact distribution;-, normal curve 



Fio. 2. Exact Distribution of (n — 3)** for n ■■ 10 and Normal Curve with 

Unit Standard Deviation 

-f—, exact distribution;-, normal curve 

220 



OrdlMU* «t liftrltatlM 


AN APPROXIMATION TO 


221 


“STtTDENT’S^' DISTRIBUTION 



Fig. 3. Exact Distribution of (n — 1J)*« for n « 10 and Normal Cubvb with 

Unit Standard Dbviation 
-, exact distribution;-, nonaud curve 




l.A-.K.l. 73 


INDIAN AGRICULTURAL RESEARCH 
INSTITUTE LIBRARY, NEW DELHI 


Data of latue 

Date of I«tue 

Date of laaue 


7 1 

1 



-— - 










—- 

— — - 

— 


i 



.1 



GIPNLK—H-401.A.RJ.—29-4- 5-15,000 





